Skip to content

Conversation

amyeroberts
Copy link
Contributor

What does this PR do?

Processes a list of list of images e.g. to run the following script:

import requests
from PIL import Image

from transformers import PixtralProcessor, PixtralImageProcessor, AutoTokenizer

url_0 = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_0 = Image.open(requests.get(url_0, stream=True).raw)

url_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_1 = Image.open(requests.get(url_1, stream=True).raw)

url_2 = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_2 = Image.open(requests.get(url_2, stream=True).raw)

image_processor = PixtralImageProcessor()
tokenizer = AutoTokenizer.from_pretrained("llava-hf/llava-1.5-7b-hf")
processor = PixtralProcessor(tokenizer=tokenizer, image_processor=image_processor, patch_size=16)

# single image processing
image_inputs = image_0
prompt = "USER: [IMG]\nWhat's the content of the image? ASSISTANT:"
inputs = processor(text=prompt, images=image_inputs, return_tensors="pt", padding=True)


# single list of images
prompt = ["USER: [IMG][IMG]\nWhat's the difference between these two images? ASSISTANT:"]
image_inputs = [image_0, image_1]
inputs = processor(text=prompt, images=image_inputs, return_tensors="pt", padding=True)

# batched list of images
prompt = ["USER: [IMG][IMG]\nWhat's the difference between these two images? ASSISTANT:", "USER: [IMG]\nWhat's the content of the image? ASSISTANT:"]
image_inputs = [[image_0, image_1], [image_2]]
inputs = processor(text=prompt, images=image_inputs, return_tensors="pt", padding=True)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@amyeroberts
Copy link
Contributor Author

cc @ArthurZucker

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankkkks

@ArthurZucker ArthurZucker merged commit 6ee62a7 into huggingface:add-pixtral Sep 13, 2024
12 of 21 checks passed
ArthurZucker added a commit that referenced this pull request Sep 14, 2024
* initial commit

* gloups

* updates

* work

* weights match

* nits

* nits

* updates to support the tokenizer :)

* updates

* Pixtral processor (#33454)

* rough outline

* Add in image break and end tokens

* Fix

* Udo some formatting changes

* Set patch_size default

* Fix

* Fix token expansion

* nit in conversion script

* Fix image token list creation

* done

* add expected results

* Process list of list of images (#33465)

* updates

* working image and processor

* this is the expected format

* some fixes

* push current updated

* working mult images!

* add a small integration test

* Uodate configuration docstring

* Formatting

* Config docstring fix

* simplify model test

* fixup modeling and etests

* Return BatchMixFeature in image processor

* fix some copies

* update

* nits

* Update model docstring

* Apply suggestions from code review

* Fix up

* updates

* revert modeling changes

* update

* update

* fix load safe

* addd liscence

* update

* use pixel_values as required by the model

* skip some tests and refactor

* Add pixtral image processing tests (#33476)

* Image processing tests

* Add processing tests

* woops

* defaults reflect pixtral image processor

* fixup post merge

* images -> pixel values

* oups sorry Mr docbuilder

* isort

* fix

* fix processor tests

* small fixes

* nit

* update

* last nits

* oups this was really breaking!

* nits

* is composition needs to be true

---------

Co-authored-by: amyeroberts <[email protected]>
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* initial commit

* gloups

* updates

* work

* weights match

* nits

* nits

* updates to support the tokenizer :)

* updates

* Pixtral processor (huggingface#33454)

* rough outline

* Add in image break and end tokens

* Fix

* Udo some formatting changes

* Set patch_size default

* Fix

* Fix token expansion

* nit in conversion script

* Fix image token list creation

* done

* add expected results

* Process list of list of images (huggingface#33465)

* updates

* working image and processor

* this is the expected format

* some fixes

* push current updated

* working mult images!

* add a small integration test

* Uodate configuration docstring

* Formatting

* Config docstring fix

* simplify model test

* fixup modeling and etests

* Return BatchMixFeature in image processor

* fix some copies

* update

* nits

* Update model docstring

* Apply suggestions from code review

* Fix up

* updates

* revert modeling changes

* update

* update

* fix load safe

* addd liscence

* update

* use pixel_values as required by the model

* skip some tests and refactor

* Add pixtral image processing tests (huggingface#33476)

* Image processing tests

* Add processing tests

* woops

* defaults reflect pixtral image processor

* fixup post merge

* images -> pixel values

* oups sorry Mr docbuilder

* isort

* fix

* fix processor tests

* small fixes

* nit

* update

* last nits

* oups this was really breaking!

* nits

* is composition needs to be true

---------

Co-authored-by: amyeroberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants