Skip to content

add gallery for transforms v2 #7331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions gallery/plot_transforms_v2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
"""
==================================
Getting started with transforms v2
==================================

Most computer vision tasks are not supported out of the box by ``torchvision.transforms`` v1, since it only supports
images. ``torchvision.transforms.v2`` enables jointly transforming images, videos, bounding boxes, and masks. This
example showcases the core functionality of the new ``torchvision.transforms.v2`` API.
"""

import pathlib

import torch
import torchvision


def load_data():
from torchvision.io import read_image
from torchvision import datapoints
from torchvision.ops import masks_to_boxes

assets_directory = pathlib.Path("assets")

path = assets_directory / "FudanPed00054.png"
image = datapoints.Image(read_image(str(path)))
merged_masks = read_image(str(assets_directory / "FudanPed00054_mask.png"))

labels = torch.unique(merged_masks)[1:]

masks = datapoints.Mask(merged_masks == labels.view(-1, 1, 1))

bounding_boxes = datapoints.BoundingBox(
masks_to_boxes(masks), format=datapoints.BoundingBoxFormat.XYXY, spatial_size=image.shape[-2:]
)

return path, image, bounding_boxes, masks, labels


########################################################################################################################
# The :mod:`torchvision.transforms.v2` API supports images, videos, bounding boxes, and instance and segmentation
# masks. Thus, it offers native support for many Computer Vision tasks, like image and video classification, object
# detection or instance and semantic segmentation. Still, the interface is the same, making
# :mod:`torchvision.transforms.v2` a drop-in replacement for the existing :mod:`torchvision.transforms` API, aka v1.

# We are using BETA APIs, so we deactivate the associated warning, thereby acknowledging that
# some APIs may slightly change in the future
torchvision.disable_beta_transforms_warning()
import torchvision.transforms.v2 as transforms

transform = transforms.Compose(
[
transforms.ColorJitter(contrast=0.5),
transforms.RandomRotation(30),
transforms.CenterCrop(480),
]
)

########################################################################################################################
# :mod:`torchvision.transforms.v2` natively supports jointly transforming multiple inputs while making sure that
# potential random behavior is consistent across all inputs. However, it doesn't enforce a specific input structure or
# order.

path, image, bounding_boxes, masks, labels = load_data()

torch.manual_seed(0)
new_image = transform(image) # Image Classification
new_image, new_bounding_boxes, new_labels = transform(image, bounding_boxes, labels) # Object Detection
new_image, new_bounding_boxes, new_masks, new_labels = transform(
image, bounding_boxes, masks, labels
) # Instance Segmentation
new_image, new_target = transform((image, {"boxes": bounding_boxes, "labels": labels})) # Arbitrary Structure

########################################################################################################################
# Under the hood, :mod:`torchvision.transforms.v2` relies on :mod:`torchvision.datapoints` for the dispatch to the
# appropriate function for the input data: :ref:`sphx_glr_auto_examples_plot_datapoints.py`. Note however, that as
# regular user, you likely don't have to touch this yourself. See
# :ref:`sphx_glr_auto_examples_plot_transforms_v2_e2e.py`.
#
# All "foreign" types like :class:`str`'s or :class:`pathlib.Path`'s are passed through, allowing to store extra
# information directly with the sample:

sample = {"path": path, "image": image}
new_sample = transform(sample)

assert new_sample["path"] is sample["path"]

########################################################################################################################
# As stated above, :mod:`torchvision.transforms.v2` is a drop-in replacement for :mod:`torchvision.transforms` and thus
# also supports transforming plain :class:`torch.Tensor`'s as image or video if applicable. This is achieved with a
# simple heuristic:
#
# * If we find an explicit image or video (:class:`torchvision.datapoints.Image`, :class:`torchvision.datapoints.Video`,
# or :class:`PIL.Image.Image`) in the input, all other plain tensors are passed through.
# * If there is no explicit image or video, only the first plain :class:`torch.Tensor` will be transformed as image or
# video, while all others will be passed through.

plain_tensor_image = torch.rand(image.shape)

print(image.shape, plain_tensor_image.shape)

# passing a plain tensor together with an explicit image, will not transform the former
plain_tensor_image, image = transform(plain_tensor_image, image)

print(image.shape, plain_tensor_image.shape)

# passing a plain tensor without an explicit image, will transform the former
plain_tensor_image, _ = transform(plain_tensor_image, bounding_boxes)

print(image.shape, plain_tensor_image.shape)
Comment on lines +87 to +109
Copy link
Member

@NicolasHug NicolasHug Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should actually document the heuristic, since it's a potentially controversial part of the contract right now. I feel like it'd be fine to just document that "a single plain tensor is treated as an image for full BC", and leave out the rest.

But not strong opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, because this is an introductory example I kinda fear that users will think "oh boy, I have to understand all this, and this looks complicated" when in reality 99% of users don't have to worry about this behaviour at all

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge, and we'll try to think of a better way to still document this