Skip to content

[NOMERGE] Feedback Transforms API implementation #5375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions torchvision/prototype/transforms/_geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
from torchvision.prototype.transforms import Transform


# I recommend deleting all classes from this branch that are old to assist code reviews.
# I know that you are wokring on this already #5323 but I would try to keep your feature branch as clean as possible.
# The issue here is that these implementations are incompatible with the functional approach on the same branch which can lead to confusion.

class HorizontalFlip(Transform):
NO_OP_FEATURE_TYPES = {Label}

Expand Down
1 change: 1 addition & 0 deletions torchvision/prototype/transforms/functional/_augment.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ def mixup_one_hot_label(one_hot_label_batch: torch.Tensor, *, lam: float, inplac


def cutmix_image(image: torch.Tensor, *, box: Tuple[int, int, int, int], inplace: bool = False) -> torch.Tensor:
# Shouldn't be we using `image_batch` similar to `mixup_image()`, to indicate that the input must be a batch?
if not inplace:
image = image.clone()

Expand Down
12 changes: 6 additions & 6 deletions torchvision/prototype/transforms/functional/_geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
import torch
from torchvision.prototype.features import BoundingBoxFormat
from torchvision.transforms import ( # noqa: F401
functional as _F,
functional as _F, # Shouldn't we importing `functional_tensor`?
InterpolationMode,
)

from ._meta_conversion import convert_bounding_box_format


def horizontal_flip_image(image: torch.Tensor) -> torch.Tensor:
return image.flip((-1,))
return image.flip((-1,)) # Why not use the _F.hflip()?


def horizontal_flip_bounding_box(bounding_box: torch.Tensor, *, image_size: Tuple[int, int]) -> torch.Tensor:
Expand All @@ -20,15 +20,15 @@ def horizontal_flip_bounding_box(bounding_box: torch.Tensor, *, image_size: Tupl
old_format=BoundingBoxFormat.XYXY,
new_format=BoundingBoxFormat.XYWH,
).unbind(-1)
x = image_size[1] - (x + w)
x = image_size[1] - (x + w) # Why not avoid formatting and do straight `boxes[:, [0, 2]] = width - boxes[:, [2, 0]]`?
return convert_bounding_box_format(
torch.stack((x, y, w, h), dim=-1),
old_format=BoundingBoxFormat.XYWH,
new_format=BoundingBoxFormat.XYXY,
)


_resize_image = _F.resize
_resize_image = _F.resize # How about videos? The low level transfroms supports it
Copy link
Collaborator

@pmeier pmeier Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there is no custom feature type for videos and the tentatively plan is to treat videos as Image's with an extra batch dimensions cc @bjuncek. Since the kernel should be agnostic to batch dimensions, they are automatically supported by all image transforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should finalize this plan soon. We got outstanding work with Videos so we need to be able to have an idea of how this works for them. Once the plan is finalized, we will be able to parallelize the work of porting all transforms concurrently.

I agree about the underlying kernels being agnostic, sounds good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the initial support for videos happens in #5335. @bjuncek is checking with other researchers working on videos if the API is sufficient or if we are missing something important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly, I still haven't gotten a reply from any of them.
@datumbox , maybe you could help me offline in order to make this discussion happen?

tentatively plan is to treat videos as Image's with an extra batch dimensions

Is there a downside to creating a video feature class that for the time being implement just this, but would be possible to extend with stuff like audio or cc-data or encoded bytes if support for this will be needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjuncek ping me offline with the names of the people you try to talk to to see if I can help.

When we say "extra branch dimensions", could you clarify the proposal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is real downside other than being more verbose in our transforms implementations. For example, we would need to have resize_video that simply defers to resize_image. This has to happen for every single image / video kernel.



def resize_image(
Expand All @@ -40,7 +40,7 @@ def resize_image(
) -> torch.Tensor:
new_height, new_width = size
num_channels, old_height, old_width = image.shape[-3:]
batch_shape = image.shape[:-3]
batch_shape = image.shape[:-3] # In some places you use `image` to denote batches and in others `image_batch`. Should we align the names?
return _resize_image(
image.reshape((-1, num_channels, old_height, old_width)),
size=size,
Expand Down Expand Up @@ -71,7 +71,7 @@ def resize_bounding_box(
) -> torch.Tensor:
old_height, old_width = old_image_size
new_height, new_width = new_image_size
return (
return ( # Shouldn't we be using a low-level kernel instead, similar to above?
bounding_box.view(-1, 2, 2)
.mul(torch.tensor([new_width / old_width, new_height / old_height]))
.view(bounding_box.shape)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


def _xywh_to_xyxy(xywh: torch.Tensor) -> torch.Tensor:
xyxy = xywh.clone()
xyxy = xywh.clone() # Nice. Do we have tests to ensure that no transform does an in-place modification of input?
xyxy[..., 2:] += xyxy[..., :2]
return xyxy

Expand Down