RFC: transforms #1

pmeier · 2021-06-25T15:17:17Z

This is an RFC on how transforms will work with the datasets after the rework of the datasets API. Rendered version here

pmeier · 2021-06-28T11:32:39Z

After some more thoughts, I see two additional problems that we need to solve. Happy to hear your ideas / thoughts.

What happens if a transform changes some feature attributes for example cropping an image or converting a bounding box to a new format. In the first case the information is at least still encoded in the new shape of the image, but for bounding box conversion, the information is gone. Thus, we need to somehow reflect these changes back to the features.

To achieve that, we could either change the feature dictionary in-place, or we need to somehow return a changed version with each transform.

In its current state the FeatureTransform has the ability of transforming multiple features, but has no notion of coherence of the features. Thus, it is impossible to share data between individual transformation steps. For example to rotate a bounding box, we need the shape of the underlying image, but there is no way to share it. The same goes for joint random transforms that need to act on the same set of parameters drawn for each sample.

NicolasHug

Thanks a lot for putting this together @pmeier !

I'm trying to get a better sense of the scope and the challenges here, so I mostly have questions at this point :)

I'll start here with a naive one: instead of introducing the new SampleTransform and FeatureTransform classes, could we "simply" (<-- gotta love this word) extend our currents transforms from torchvsion.transforms to handle not just tensors, but also X where X is what the datapipe returns? This way users could basically do dataset.map(torchvision.transforms.Rotate(30.0)), which seems optimal in terms of UX.

(I'm not saying we should do this, but the answer to this question will help me better understand where we're going)

Also, some seemingly related discussed sparked in pytorch/vision#4029 (comment). If we start supporting new Tensor-like classes like class Image(Tensor) and class BoundingBox(Tensor), we might be able to avoid the introduction of the new Feature structure.

transforms.md

pmeier

@NicolasHug

Also, some seemingly related discussed sparked in pytorch/vision#4029 (comment). If we start supporting new Tensor-like classes like class Image(Tensor) and class BoundingBox(Tensor), we might be able to avoid the introduction of the new Feature structure.

If we can do that, it would solve a lot of the questions you have and issues I mentioned. PyTorch offers the __torch_function__ hook. Without going into details, with it it is possible to perform any PyTorch operation, for example transform an image, and return our custom class. So something like

>>> image = torchvision.Image(torch.rand(3, 256, 256))
>>> transformed_image = image + 1
>>> type(transformed_image)
<class 'torchvision.Image'>

is possible without any hacks from our side. Doing something like this of course implies that we convert all numerics to tensors.

The problem we had with this approach is that we were unable to come up with a strict enough rule set, that would let us determine if an arbitrary operation should return our custom tensor type. Take the example above: is a floating-point tensor with values outside of the range [0, 1] still an image? Just to name a second: "If you subtract a random real number from a class label, is the result still a class label?" You can come up with a number of these for all proposed custom tensor classes.

It should be fairly easy to control this for all builtin transformations, but we can't guarantuee that in general.

Thus, we decided to not encode this extra information in the tensor itself. The new datapipes will return a dictionary as sample. Encoding the information in the keys rather than the values is also possible, but this also has two problems:

Having non-string keys would make working with a sample dictionary a lot more cumbersome.
We can't guarantuee that a user transformation keeps the keys either, so this has the same problems as encoding the information in the values of the dictionary.

What is the rational for separating the Feature type and the value?

For example, what is the benefit of this over
transform(Image(torch.rand(3, 256, 256)))
transform(BoundingBox(type="XYHW", (0, 0, 256, 256)))
?

Thus, we came up with the plan to encode the extra information in a custom support structure.

IMO in the end this all boils down to two possible scenarios:

We provide a more rigid framework that is based on the assumption that the transform ensures that it only returns valid tensor types. This makes the UX better for users that only rely on builtin transforms. At the same time it will be harder to implement custom transformations.
We base all of our stuff on very minimal assumptions at the cost of a worse UX.

I vote for option 1., but lets discuss this idea a little more.

This would also make something like

could we "simply" (<-- gotta love this word) extend our currents transforms from torchvsion.transforms to handle not just tensors, but also X where X is what the datapipe returns? This way users could basically do dataset.map(torchvision.transforms.Rotate(30.0)), which seems optimal in terms of UX.

a lot simpler, because now each transform would know which feature transform to dispatch based on the input type.

pmeier · 2021-07-05T14:43:30Z

After some more thought, I think @NicolasHug's comments hit the nail on the head: my proposal was complicated and still wasn't able to solve that he and I myself pointed out. Thus, I rewrote the proposal with the assumption that we can use custom tensor classes. I've added an implications section to discuss this. Overall I like the new proposal much better. Happy to hear your thoughts.

fmassa

This is looking pretty good, thanks a lot @pmeier !

As we've discussed on VC, here are a few notes from our chat.

torchvision/transforms/_core.py

fmassa · 2021-08-16T14:54:07Z

torchvision/features/_core.py

+__all__ = ["Feature"]
+
+
+class Feature(torch.Tensor):


This is great!

One meta-question that the team needs to figure out is if we will be supporting torchscript for those features or not. IIUC this might not be propagating the inherited type to torchscripted functions

I've split this into Feature and TensorFeature, since a Feature might not necessarily be a Tensor. For example, Text cannot be represented by a tensor, but could very well be regarded as Feature from torchtext.

fmassa · 2021-08-16T14:54:46Z

torchvision/features/_bounding_box.py

+    def format(self) -> BoundingBoxFormat:
+        return self._format
+
+    def convert(self, format: Union[str, BoundingBoxFormat]) -> "BoundingBox":


Nice! One other thing to think about is if there would be a minimum common API across all our data types that we might want to enforce?

Currently, I don't see anything. Happy to hear ideas though. If something comes up later, that should be retro-fittable.

I've added a TensorFeature.from_tensor method, that should make it easy to create a new feature from just a tensor.

transforms.md

torchvision/features/_image.py

datumbox · 2021-08-31T11:55:08Z

torchvision/transforms/_geometry.py

+        return Image(input.flip((-1,)))
+
+    @staticmethod
+    def bounding_box(input: BoundingBox) -> BoundingBox:


Agreed. This way, when possible, we share the same functional implementation for images and ~~bboxes~~ masks and when not possible we implement new. Also we leave it up to the user to make the call depending on the use-case rather than assuming the BBox is in some part of the target.

See also this corner case transform that requires joint transform of all image, label, bbox:
https://github.com/pytorch/vision/blob/96f6e0a117d5c56f7e0237851dbb96144ebb110b/references/detection/transforms.py#L54-L129

Not sure I understand:

we share the same functional implementation for images and bboxes and when not possible we implement new

In most cases the implementation will be different for images, bounding boxes and possible other types. In any case, we will always have a separate method for each type. In case the transformation is the same, it is the developers responsibility to implement it with minimal duplications:

class FooTransform(Transform): @staticmethod def _foo(input): return input @staticmethod def image(input: Image) -> Image: return FooTransform._foo(input) @staticmethod def bounding_box(input: BoundingBox) -> BoundingBox: return FooTransform._foo(input)

Also we leave it up to the user to make the call depending on the use-case rather than assuming the BBox is in some part of the target.

That is actually what we are trying to avoid. If you want to, you can call HorizontalFlip.image(...), but during normal operation you would do

transform = HorizontalFlip() transformed_image = transform(image) transformed_bbox = transform(bbox)

This is needed to be able to transform a complete sample drawn from a dataset, which might include more than just an image.

For a better overview of what we are trying to achieve, please read the accompanying document which is updated as this RFC progresses.

@pmeier I didn't disagree with you. I just said I like the approach :)

What I meant is that there are a few transforms that can be implemented for images and bboxes in the same way. That can live in the functional API and as you said, we can call it for both image() and bounding_box(). But also the proposed API supports cases when this is not the case by providing separate implementations for the two.

That is actually what we are trying to avoid.

Yes I had a discussion with @fmassa who explained that you want to avoid this to make things composable. Sounds good. Just make sure you look the example I sent you because there are transforms where image, bbox and labels must be processed together. I think the API can cover for that but it's worth having this use-case in mind.

torchvision/transforms/_geometry.py

fmassa · 2021-08-31T14:34:24Z

torchvision/transforms/_core.py

+            # transform takes none or more than one than one positional arguments
+            if len(argspec.args) != 1:
+                continue


As we discussed in VC, I believe this can lead to subtle bugs. If the user writes their transform as

class Rotate(Transform): @staticmethod def image(img, angle): pass

instead of

class Rotate(Transform): @staticmethod def image(img, *, angle): pass

then the transformation will not be automatically registered, and no errors / warnings will show up, only that the transform will silently not be applied because it wasn't properly auto-registered.

I think it might be better to be explicit here and instead use decorators to register a function to a transform. Something in the lines of

class Rotate(Transform): pass @register_transform(transform=Rotate, input_type=Image) def rotate_image(img, angle): pass

I think it might be better to be explicit here and instead use decorators to register a function to a transform.

I dislike this approach, because we would have to manually namespace everything with the feature type. For example, I think it is more concise to have Rotate.image and Rotate.bounding_box rather than rotate_image and rotate_bounding_box. Note that in the latter case, the namespace would have roughly n times more entries where n is the number of features we support.

The auto registering can be disabled and the registering can be performed manually in the constructor:

class Rotate(Transform, auto_register=False): def __init__(self): super().__init__() self.register_feature_transform(Image, self.foo) @staticmethod def foo(input, angle): pass

If we ultimately go with registering standalone functions with a decorator, I suggest we use a classmethod of the transform, i.e.

class Rotate(Transform): pass @Rotate.register(Image) def rotate_image(input, angle): pass

As we discussed in VC, I believe this can lead to subtle bugs.

True, good catch! The rule can be made more robust though. We evoke the feature transforms with feature_transform(input, **params) where params can be an empty dict. That means all of these signatures need to be supported:

def foo(input): pass def foo(input=None): pass def foo(input, bar): # your example pass def foo(input, bar=None): pass def foo(input=None, bar=None): pass def foo(input, *, bar): pass def foo(input, *, bar=None): pass def foo(input=None, *, bar): pass def foo(input=None, *, bar=None): pass

For Python >=3.8 we also need to keep positional-only arguments in mind.

For the debugging, the user can use transform.is_supported(EDIT: as of writing this comment, this method only existed locally. It is included in the most recent commit) and pass an feature type or instance to see if it would be transformed or not. But I agree that one would probably only do that after discovering that something is wrong. Another idea would be to add a verbose flag that prints extra information during the auto registration.

Since all of this is a trade-off between convenience with possible subtle bugs and explicitness with a higher verbosity, I think we should wait for more opinions before we decide.

I've added a commit that makes the matching more robust and adds the option to print information about the process:

class TestTransform(Transform, verbose=True): @staticmethod def incompatible_signature1(): pass @staticmethod def incompatible_signature2(*, input): pass # Python >=3.8 only # @staticmethod # def incompatible_signature3(input, foo, /): # pass @staticmethod def _private(input): pass @staticmethod def unknown(input): pass @staticmethod def imaeg(input): pass @staticmethod def image(input, foo): pass @staticmethod def boundingbox(input): pass @staticmethod def bounding_box(input, *, foo): pass

TestTransform._private() was not registered as feature transform, because it is private. TestTransform.bounding_box() was registered as feature transform for type 'BoundingBox'. TestTransform.boundingbox() was not registered as feature transform, because its name doesn't match any known feature type. Did you mean to name it 'bounding_box' to be registered for type 'BoundingBox'? TestTransform.imaeg() was not registered as feature transform, because its name doesn't match any known feature type. Did you mean to name it 'image' to be registered for type 'Image'? TestTransform.image() was registered as feature transform for type 'Image'. TestTransform.incompatible_signature1() was not registered as feature transform, because it cannot be invoked with incompatible_signature1(input, **params). TestTransform.incompatible_signature2() was not registered as feature transform, because it cannot be invoked with incompatible_signature2(input, **params). TestTransform.unknown() was not registered as feature transform, because its name doesn't match any known feature type.

fmassa · 2021-08-31T14:35:50Z

torchvision/features/_bounding_box.py

+            self.formats.XYWH: self._xyxy_to_xywh,
+            self.formats.CXCYWH: self._xyxy_to_cxcywh,
+        }
+


For when we will be discussing specifics of the data classes, it might be good to add a __torch_function__ that forbids applying operations of two bounding boxes if they have different image_size, while allowing BoundingBox + Tensor to work

ppwwyyxx · 2021-09-02T05:38:36Z

It seems a big issue of the current prototype is that it cannot implement Compose.get_params, therefore cannot implement nested Compose.

Why it cannot implement Compose.get_params (efficiently):
Compose.get_params cannot be implemented simply by calling each sub-transform's get_params. For example let's try to obtain Compose.get_params of Compose([A, B, C]). To call B.get_params you need the input sample of B. Therefore you need to first run the transform of A to transform the sample, before you can call B.get_params.
In other words, getting the params already requires running/applying the transform. When the params are returned they are used again (in Transform.forward()) to apply the transform - which is a 2x waste of computation.

It may seem nested compose is useless because it can be flattened, but that's not the case. Beyond Compose as a sequential container, there might be other types of containers, such as Compose(RandomOrder([T1, T2, T3]), T4), RandomChoice([Compose([T1, T2]), T3]). These pipelines that have nested structures will all have this issue.

pmeier · 2021-09-02T08:43:32Z

@ppwwyyxx

It seems a big issue of the current prototype is that it cannot implement Compose.get_params, therefore cannot implement nested Compose.

I agree with the argument, but disagree with the conclusion.

During "normal" operation the transform containers (Compose, RandomChoice, ...) do not need access to the get_params method. Every transformation simply receives the sample it is supposed to transform and thus there is no duplicated computation. Example: Assume we have Compose[Compose[T1, T2], T3]. T1 receives the sample, T2 receives T1(sample), and T3 receives Compose[T1, T2](sample).

Still, there might be scenarios where you conclusion is right.

If we have two equal but separate transforms that should operate with the same parameters:

transform1 = MyTransform()
transform2 = MyTransform()
params = transform1.get_params()
transformed_image1 = transform1(image, params=params)
transformed_image2 = transform(image, params=params)

I currently fail to see a use case for this, but maybe there is one.

pmeier · 2021-09-02T19:32:21Z

@ppwwyyxx I've added minimal implementations for RandomChoice and RandomApply.

pmeier · 2021-09-02T19:33:23Z

With the latest commits, the accompanying document is no longer in sync. I'll update it ASAP.

ppwwyyxx · 2021-09-11T00:50:15Z

Thanks @pmeier for following up! I agree that you can implement containers by giving up the ability to do get_params. However I do feel that having get_params available is pretty important. I'll list some scenarios where this is useful:

With get_params users can save the params and later use them to apply an inverse of the transforms (if the transform supports it). This is useful during inference as well as test-time augmentation, where we transform the images, detect objects on the new image, and then need to invert the transform so we map the detected boxes/masks back to the original image space.
Users can obtain a sequence of params, and apply the transforms one by one. This allows them to insert logic in between. With code this looks like:

# transforms: a composite transform container (e.g. with Composite, RandomOrder, etc)
tfms_and_prms: List[Tuple[Transform, dict]] = transforms.get_params(inputs)
for tfm, param in tfms_and_prms:
   inputs = tfm(inputs, params=param)
   # HERE: do some extra work with inputs

Insertion of this logic have a few use cases, e.g.:

For bounding boxes, it's up to users whether they want to clip the boxes to image sizes after every transform (might be needed after rotation/affine). They can make this decision at here
For keypoints, users need to modify the keypoints labels (e.g. left eye -> right eye) if transform is a flip. Users can implement this logic if they can obtain parameters of the transform, to check if a flip has happened.
Keypoints in COCO has a "visibility" field, users may (or may not) want to check after every transform whether some keypoints are out of range, and mark them as invisible.

These extra but very simple logic can be added in the above snippet without changing the input types transforms will see.
Without the ability to manually apply the params, the above custom logic seem still doable, but will probably need to be added as new data types or subclasses of existing data types, and then register new transform methods for the types. I'm not sure at the moment how easy it is to extend the types but maybe that's not bad.

However, actually the above snippet is fundamentally difficult to realize, because as I commented earlier doing get_params already needs to run the transforms so we waste some computation. The way we address this in detectron2 is to separately define "a minimal subset of inputs that are needed to get_params" which we called "AugInput". This also feels a bit awkward.. so I don't have a good suggestion on what exactly to do. But I hope the above example use cases can be useful.

pmeier · 2021-09-13T07:23:30Z

@ppwwyyxx Thanks a lot for your input!

With get_params users can save the params and later use them to apply an inverse of the transforms (if the transform supports it). This is useful during inference as well as test-time augmentation, where we transform the images, detect objects on the new image, and then need to invert the transform so we map the detected boxes/masks back to the original image space.

I agree, that is an important use case that should be supported. Would it be sufficient to have a functional inverse

transform = MyComplexTransform()

sample = next(dataset)

transformed_sample = transform(sample)
transformed_prediction = model(sample["input"])

# magically materialize this here, more about that later
params = ...
prediction = transform.inverse(transformed_prediction, params=params)

or should we actually instantiate an inverse transform?

inverse of the transforms (if the transform supports it)

How do you decide if a transform has an inverse or not? For example, crop is the inverse of pad, but is pad also considered an inverse to crop? I would say no for images, since pad cannot retrieve the information lost by crop. For bounding boxes or keypoints on the other hand I would say yes, since no information was lost here.

Users can obtain a sequence of params, and apply the transforms one by one. This allows them to insert logic in between.

Although I generally dislike polymorphism, it can maybe make things simpler here. We could have a return_params: bool = False flag in the forward such that

transform = MyComplexTransform()

sample = next(dataset)

transformed_sample1, params = transform(sample, return_params=True)
transformed_sample2 = transform(sample, params=params)

assert transformed_sample1 == transformed_sample2

This would transform your snippet into

# transforms: a composite transform container (e.g. with Composite, RandomOrder, etc)
for tfm in transforms:
    inputs, params = tfm(inputs)
    # HERE: do some extra work with inputs

and avoid extracting the params upfront either with doubled computation or dummy inputs.

What do you think?

initial draft

7003c1e

pmeier requested review from NicolasHug, datumbox, prabhat00155 and fmassa June 25, 2021 15:17

fmassa mentioned this pull request Jun 30, 2021

[feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia? or colorsys python core module?) pytorch/vision#4029

Open

NicolasHug reviewed Jul 1, 2021

View reviewed changes

transforms.md Outdated Show resolved Hide resolved

transforms.md Outdated Show resolved Hide resolved

transforms.md Outdated Show resolved Hide resolved

pmeier commented Jul 1, 2021

View reviewed changes

pmeier added 2 commits July 5, 2021 16:36

rewrite proposal to use custom feature classes

734214f

fix wording

96f6ffc

pmeier requested a review from NicolasHug July 5, 2021 14:43

misc test fixes

17a80b7

fmassa reviewed Aug 16, 2021

View reviewed changes

pmeier added 4 commits August 17, 2021 12:49

make features self-contained

6b1e327

cleanup

dcfcddb

add compose

c4f578e

add auto register and better random support

8fd8d97

vmoens reviewed Aug 19, 2021

View reviewed changes

transforms.md Show resolved Hide resolved

pmeier added 4 commits August 19, 2021 13:13

add automatic functional dispatch

9611403

fix functional interface

3e411a9

update proposal

1c0b75e

introduce from_tensor creation method

dd66bf1

vmoens mentioned this pull request Aug 23, 2021

[RFC] high-level dispatching transform pipeline #2

Closed

vmoens reviewed Aug 23, 2021

View reviewed changes

transforms.md Outdated Show resolved Hide resolved

factor out recursive apply

d2ffc2d

datumbox reviewed Aug 31, 2021

View reviewed changes

torchvision/features/_image.py Show resolved Hide resolved

datumbox reviewed Aug 31, 2021

View reviewed changes

torchvision/transforms/_geometry.py Outdated Show resolved Hide resolved

fmassa reviewed Aug 31, 2021

View reviewed changes

pmeier added 3 commits September 1, 2021 09:45

improve auto registering

15f2f01

add READMe on how to implement a new transform

c240308

add implementation for MixUp

1e1596c

pmeier added 3 commits September 2, 2021 12:17

add support for creating transforms from callables and other transforms

14ffb8c

add better support for containers

d5c0f45

add documentation

c7dde4e

pmeier added 7 commits September 3, 2021 11:47

update RFC document

012ba3b

improve query

373fa13

add example how feature transform requirements can look like

b3a990f

add minimal implementation for segmentation training

3b1a45a

refactorget_params to be able to return other parameters based on type

8901f6a

simplify transform wrapping

6ede737

add RandomOrder container

82485a3

This was referenced Sep 27, 2021

WIP Allowing for grouped (random) transformations of tensors/tensor-like objects [proof-of-concept] pytorch/vision#4267

Closed

[RFC] Abstractions for segmentation / detection transforms pytorch/vision#1406

Open

fmassa mentioned this pull request Nov 2, 2021

Batched box scale pytorch/vision#4835

Closed

pmeier mentioned this pull request Nov 4, 2021

add initial chunk of prototype transforms pytorch/vision#4861

Merged

pmeier mentioned this pull request May 22, 2022

Inversion of prototype transforms pytorch/vision#6062

Open

RFC: transforms #1

Are you sure you want to change the base?

RFC: transforms #1

Uh oh!

Conversation

pmeier commented Jun 25, 2021

Uh oh!

pmeier commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier commented Jul 5, 2021

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmeier Aug 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

datumbox Aug 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fmassa Aug 31, 2021 • edited by pmeier Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmeier Sep 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ppwwyyxx commented Sep 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Sep 2, 2021

Uh oh!

pmeier commented Sep 2, 2021

Uh oh!

pmeier commented Sep 2, 2021

Uh oh!

ppwwyyxx commented Sep 11, 2021

Uh oh!

pmeier commented Sep 13, 2021

Uh oh!

Uh oh!

pmeier commented Jun 28, 2021 •

edited

Loading

pmeier Aug 20, 2021 •

edited

Loading

datumbox Aug 31, 2021 •

edited

Loading

fmassa Aug 31, 2021 •

edited by pmeier

Loading

pmeier Sep 1, 2021 •

edited

Loading

ppwwyyxx commented Sep 2, 2021 •

edited

Loading