Process list of list of images #33465

amyeroberts · 2024-09-13T10:18:13Z

What does this PR do?

Processes a list of list of images e.g. to run the following script:

import requests
from PIL import Image

from transformers import PixtralProcessor, PixtralImageProcessor, AutoTokenizer

url_0 = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_0 = Image.open(requests.get(url_0, stream=True).raw)

url_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_1 = Image.open(requests.get(url_1, stream=True).raw)

url_2 = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_2 = Image.open(requests.get(url_2, stream=True).raw)

image_processor = PixtralImageProcessor()
tokenizer = AutoTokenizer.from_pretrained("llava-hf/llava-1.5-7b-hf")
processor = PixtralProcessor(tokenizer=tokenizer, image_processor=image_processor, patch_size=16)

# single image processing
image_inputs = image_0
prompt = "USER: [IMG]\nWhat's the content of the image? ASSISTANT:"
inputs = processor(text=prompt, images=image_inputs, return_tensors="pt", padding=True)


# single list of images
prompt = ["USER: [IMG][IMG]\nWhat's the difference between these two images? ASSISTANT:"]
image_inputs = [image_0, image_1]
inputs = processor(text=prompt, images=image_inputs, return_tensors="pt", padding=True)

# batched list of images
prompt = ["USER: [IMG][IMG]\nWhat's the difference between these two images? ASSISTANT:", "USER: [IMG]\nWhat's the content of the image? ASSISTANT:"]
image_inputs = [[image_0, image_1], [image_2]]
inputs = processor(text=prompt, images=image_inputs, return_tensors="pt", padding=True)

HuggingFaceDocBuilderDev · 2024-09-13T10:37:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts · 2024-09-13T10:40:22Z

cc @ArthurZucker

ArthurZucker

Thankkkks

* initial commit * gloups * updates * work * weights match * nits * nits * updates to support the tokenizer :) * updates * Pixtral processor (#33454) * rough outline * Add in image break and end tokens * Fix * Udo some formatting changes * Set patch_size default * Fix * Fix token expansion * nit in conversion script * Fix image token list creation * done * add expected results * Process list of list of images (#33465) * updates * working image and processor * this is the expected format * some fixes * push current updated * working mult images! * add a small integration test * Uodate configuration docstring * Formatting * Config docstring fix * simplify model test * fixup modeling and etests * Return BatchMixFeature in image processor * fix some copies * update * nits * Update model docstring * Apply suggestions from code review * Fix up * updates * revert modeling changes * update * update * fix load safe * addd liscence * update * use pixel_values as required by the model * skip some tests and refactor * Add pixtral image processing tests (#33476) * Image processing tests * Add processing tests * woops * defaults reflect pixtral image processor * fixup post merge * images -> pixel values * oups sorry Mr docbuilder * isort * fix * fix processor tests * small fixes * nit * update * last nits * oups this was really breaking! * nits * is composition needs to be true --------- Co-authored-by: amyeroberts <[email protected]>

* initial commit * gloups * updates * work * weights match * nits * nits * updates to support the tokenizer :) * updates * Pixtral processor (huggingface#33454) * rough outline * Add in image break and end tokens * Fix * Udo some formatting changes * Set patch_size default * Fix * Fix token expansion * nit in conversion script * Fix image token list creation * done * add expected results * Process list of list of images (huggingface#33465) * updates * working image and processor * this is the expected format * some fixes * push current updated * working mult images! * add a small integration test * Uodate configuration docstring * Formatting * Config docstring fix * simplify model test * fixup modeling and etests * Return BatchMixFeature in image processor * fix some copies * update * nits * Update model docstring * Apply suggestions from code review * Fix up * updates * revert modeling changes * update * update * fix load safe * addd liscence * update * use pixel_values as required by the model * skip some tests and refactor * Add pixtral image processing tests (huggingface#33476) * Image processing tests * Add processing tests * woops * defaults reflect pixtral image processor * fixup post merge * images -> pixel values * oups sorry Mr docbuilder * isort * fix * fix processor tests * small fixes * nit * update * last nits * oups this was really breaking! * nits * is composition needs to be true --------- Co-authored-by: amyeroberts <[email protected]>

Process list of list of images

9fe6758

ArthurZucker approved these changes Sep 13, 2024

View reviewed changes

ArthurZucker merged commit 6ee62a7 into huggingface:add-pixtral Sep 13, 2024
12 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Process list of list of images #33465

Process list of list of images #33465

Uh oh!

amyeroberts commented Sep 13, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Sep 13, 2024

Uh oh!

amyeroberts commented Sep 13, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Process list of list of images #33465

Process list of list of images #33465

Uh oh!

Conversation

amyeroberts commented Sep 13, 2024

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 13, 2024

Uh oh!

amyeroberts commented Sep 13, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants