Skip to content

Fill color support for tensor affine transforms #2904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Dec 2, 2020

Conversation

voldemortX
Copy link
Contributor

@voldemortX voldemortX commented Oct 27, 2020

PR for Issue #2887

It seems that affine(), rotate() and perspective() all use _apply_grid_transform(), so I added the code there.
I ran unit tests for all 3 functions and passed, but I can't figure out whether it is right for bilinear. e.g. If I change

for r in [0, ]:
to [0, 2], both the original code and the current commit can't pass the tests.

p.s. I only have CUDA 10.0 at the moment, so I had to block the version check in torchvision/extensions.py, not sure if that caused the above mentioned test failures.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @voldemortX !
I left few comments for minor improvements. We have to update also the docstring for F.affine and others.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 27, 2020

I can't figure out whether it is right for bilinear

@voldemortX this is a good question and you are right, those tests do not pass as interpolation results are different between PIL and torch. So, for instance, we can skip the tests for resample=2.

@voldemortX
Copy link
Contributor Author

Actually I have another question needs clearing: If all we are considering is linear operations (Euclidean transforms and linear interpolations), the dummy mask should always be 0/1 after affine ops? Although even if that is the case I'd still prefer sticking with 0.5 thresholding.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 27, 2020

the dummy mask should always be 0/1 after affine ops?

Actually, no if resample>0. You'll have float values on the boundaries due to interpolations between 0 and 1, right? For example

import torch
from torchvision.transforms.functional import affine

mask = torch.ones(1, 8, 8, dtype=torch.float32)
res = affine(mask, angle=45, translate=[0.1, 0.0], scale=1.0, shear=[0., 0.], resample=2)
res[0, :, :]
> 
tensor([[0.0000, 0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 0.3281, 0.0000],
        [0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.3281],
        [0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.3281],
        [0.0000, 0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 0.3281, 0.0000]])

@voldemortX
Copy link
Contributor Author

voldemortX commented Oct 27, 2020

the dummy mask should always be 0/1 after affine ops?

Actually, no if resample>0. You'll have float values on the boundaries due to interpolations between 0 and 1, right? For example

import torch
from torchvision.transforms.functional import affine

mask = torch.ones(1, 8, 8, dtype=torch.float32)
res = affine(mask, angle=45, translate=[0.1, 0.0], scale=1.0, shear=[0., 0.], resample=2)
res[0, :, :]
> 
tensor([[0.0000, 0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 0.3281, 0.0000],
        [0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.3281],
        [0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.3281],
        [0.0000, 0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 0.3281, 0.0000]])

Well in that case, if we replace a pixel's value with the fillcolor directly in bilinear resample mode (like what just implemented), it would be a different behavior than this?

Maybe we need img * mask + (1 - mask) * fillcolor instead?

@codecov
Copy link

codecov bot commented Oct 27, 2020

Codecov Report

Merging #2904 into master will decrease coverage by 0.08%.
The diff coverage is 80.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2904      +/-   ##
==========================================
- Coverage   73.41%   73.32%   -0.09%     
==========================================
  Files          99       99              
  Lines        8801     8840      +39     
  Branches     1389     1397       +8     
==========================================
+ Hits         6461     6482      +21     
- Misses       1915     1930      +15     
- Partials      425      428       +3     
Impacted Files Coverage Δ
torchvision/transforms/functional.py 80.44% <ø> (ø)
torchvision/transforms/functional_tensor.py 73.83% <75.00%> (-0.32%) ⬇️
torchvision/transforms/transforms.py 80.51% <84.00%> (-0.01%) ⬇️
torchvision/io/__init__.py 68.75% <0.00%> (-20.91%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9e31a6...50d311d. Read the comment docs.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 27, 2020

@voldemortX I see what you mean about img * mask + (1 - mask) * fillcolor. Probably, this is something to adopt, but let me check that.

@voldemortX
Copy link
Contributor Author

One more thought that needs checking: affine(), rotate() and perspective() all have no inter-channel operations, so they could all support a n-tuple for fill color (other than a single float and int value). I'm not so sure since I find the docstring for F.affine and F_pil.affine states only int for fillcolor.

If this is correct, I'll try and support n-tuple as well.

Btw, do I need to change base class to test for float images, e.g. a new function _create_float_data() or a new metric for comparing float outputs.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 28, 2020

@voldemortX i was also thinking about supporting tuple of values. Technically, it is not that complicated, however for instance, let's accomplish this feature with single int or float and work on tuple in another PR.

Btw, do I need to change base class to test for float images, e.g. a new function _create_float_data() or a new metric for comparing float outputs.

Currently, we are doing simply cast to float like here :

tensor = tensor.to(dtype=dt)

so no need to a new function.

@voldemortX
Copy link
Contributor Author

@voldemortX i was also thinking about supporting tuple of values. Technically, it is not that complicated, however for instance, let's accomplish this feature with single int or float and work on tuple in another PR.

Btw, do I need to change base class to test for float images, e.g. a new function _create_float_data() or a new metric for comparing float outputs.

Currently, we are doing simply cast to float like here :

tensor = tensor.to(dtype=dt)

so no need to a new function.

Great, I'll commit for single value soon.

@facebook-github-bot
Copy link

Hi @voldemortX!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 2, 2020

@voldemortX I checked a bit more and yes img * mask + (1 - mask) * fillcolor is a better solution for linear interpolation. However, it wont match PIL's behaviour, but I think this is not a problem.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @voldemortX !
I left few comments to update it a little more.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 3, 2020

@voldemortX let's try to keep it simple. In addition if we would like to support list of int, float, let's do it now like that:

from typing import List
import torch

class A(torch.nn.Module):   
    def __init__(self, fill=0):
        super().__init__()
        # fill can be int, float or list
        # for torchscript: single value should be in the list: [value, ]
        self.fill = fill
        
    def forward(self, x):
        
        fill = self.fill
        if isinstance(x, torch.Tensor) and isinstance(fill, (int, float)):
            fill = [float(fill), ]
        return func(x, fill)
    
def func(x: torch.Tensor, fill: List[float]):    
    if not isinstance(x, torch.Tensor):
        print("Call Pillow with fill: {}".format(fill))
        return
    # dummy tensor func implementation : 
    y = x.clone()
    mask = x < 0.5 * x.max()    
    y[mask] = torch.tensor(fill, dtype=x.dtype)
    return y

For PIL, it should not change anything. Type hint for fill in F.rotate/affine will be only for torchscript. For tensor input we will cast in all functions fill = [float(fill), ] such that torchscript should pass.

Btw, I'll trying to land the following PR (#2952) with uniformized arg names, so we will have to take that into account.

@voldemortX
Copy link
Contributor Author

voldemortX commented Nov 4, 2020

@voldemortX let's try to keep it simple. In addition if we would like to support list of int, float, let's do it now like that:

from typing import List
import torch

class A(torch.nn.Module):   
    def __init__(self, fill=0):
        super().__init__()
        # fill can be int, float or list
        # for torchscript: single value should be in the list: [value, ]
        self.fill = fill
        
    def forward(self, x):
        
        fill = self.fill
        if isinstance(x, torch.Tensor) and isinstance(fill, (int, float)):
            fill = [float(fill), ]
        return func(x, fill)
    
def func(x: torch.Tensor, fill: List[float]):    
    if not isinstance(x, torch.Tensor):
        print("Call Pillow with fill: {}".format(fill))
        return
    # dummy tensor func implementation : 
    y = x.clone()
    mask = x < 0.5 * x.max()    
    y[mask] = torch.tensor(fill, dtype=x.dtype)
    return y

For PIL, it should not change anything. Type hint for fill in F.rotate/affine will be only for torchscript. For tensor input we will cast in all functions fill = [float(fill), ] such that torchscript should pass.

Btw, I'll trying to land the following PR (#2952) with uniformized arg names, so we will have to take that into account.

Should we just "revert" to the previous commit or implement for List in this PR also?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 4, 2020

@voldemortX I'd say implement for List in this PR.

@voldemortX
Copy link
Contributor Author

Okay! I'll do it soon with some new unit tests.

@voldemortX
Copy link
Contributor Author

@vfdev-5 The documentation is n-tuple, should we do Tuple or List?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 4, 2020

@voldemortX we wont be able to support single value with Tuple I think. Let's make it as List.

@voldemortX
Copy link
Contributor Author

@voldemortX we wont be able to support single value with Tuple I think. Let's make it as List.

Should the doc be changed then?

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voldemortX thanks a lot for the update !
I left a nit comment. Otherwise looks good!

@vfdev-5 vfdev-5 marked this pull request as ready for review November 30, 2020 11:16
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voldemortX I have few other comments on testing and docstrings.
As we modify torchvision/transforms/transforms.py here, we also have to add few tests to torchvision/test/test_transforms_tensor.py

Thanks for working on this PR and appologies about how much time it take to make the merge.

"bands of the image ({} != {})")
raise ValueError(msg.format(len(fill), num_bands))
else:
fill = tuple(fill)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voldemortX seems like I missed this modification in the previous review. Why do we need to modify the code here ? It's a PIL side and I think it can be kept as is ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't quite remember changing that actually... I'll get it back to what it was.

Copy link
Contributor Author

@voldemortX voldemortX Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I remember now. I need the tuple(fill) conversion since now the input is formatted as List, also that means I can't do tests with tuple fill inputs. @vfdev-5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are right ! Pillow accepts only tuples and we can give a list too. Maybe, we can make the check more straightforward :

    if isinstance(fill, (list, tuple)):
        if len(fill) != num_bands:
            msg = ("The number of elements in 'fill' does not match the number of "
                   "bands of the image ({} != {})")
            raise ValueError(msg.format(len(fill), num_bands))

        fill = tuple(fill)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I'll do that together in the next commit.

mask = mask.expand_as(img)
fill_img = torch.tensor(fill, dtype=img.dtype, device=img.device).view(1, len(fill), 1, 1).expand_as(img)
if mode == 'nearest':
img[mask < 1e-3] = fill_img[mask < 1e-3] # Leave some room for error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question here. Appologies if we've already discussed that. Technically, we have the mask as img dtype and we recreate boolean tensor. And we have to compare to 1 or 0.5, right ?

img[mask < 0.5]
# or
img[mask < 1.0]

why the threshold is 1-e3 here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In nearest interpolation, the mask should be all 0/1. So anything between (0-1) should be right? I use this to compensate for possible rounding errors from CUDA or something.

Copy link
Collaborator

@vfdev-5 vfdev-5 Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In nearest the result will be already rounded and we only have to get back bool mask values. I think we can either do img[mask < 0.5] or mask = mask.bool(). Let's do:

mask = mask < 0.5
img[mask] = fill_img[mask]

also without computing twice the binary mask

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems better and faster.

(33, (5, -4), 1.0, [0.0, 0.0], [0, 0, 0]),
(45, [-5, 4], 1.2, [0.0, 0.0], [1, 2, 3]),
(33, (-4, -8), 2.0, [0.0, 0.0], [255, 255, 255]),
(85, (10, -10), 0.7, [0.0, 0.0], None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test here with a single int and float as fill value as [a_int, ] and (b_float, ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! I'll add the tests and docstring changes together in the next commit.

0.03,
msg="{}: {}\n{} vs \n{}".format(
(img_size, r, dt, a, e, c),
for f in [None, [0, 0, 0], [1, 2, 3], [255, 255, 255]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's add single int and float values (as [a_int, ] and (b_float, ))

@@ -573,10 +573,9 @@ def perspective(
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` are supported.
For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.
fill (n-tuple or int or float): Pixel fill value for area outside the rotated
fill (sequence or int or float, optional): Pixel fill value for area outside the rotated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the docstring more explicit about how it works for PIL, tensor and torchscript:

fill (sequence or int or float, optional): Pixel fill value for the area outside the rotated
    image. If int or float, the value is used for all bands respectively.
    This option is supported for PIL image and Tensor inputs. 
    In torchscript mode single int/float value is not supported, please use a tuple 
    or list of length 1: ``[value, ]``.
    If input is PIL Image, the options is only available for ``Pillow>=5.0.0``.

same for other docstrings.

Copy link
Contributor Author

@voldemortX voldemortX Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vfdev-5 I did some sorting. It seems the current version means affine functions support:
int/float/list/None for Tensor.
int/float/list/tuple/None for PIL.
list/None for torchscript.
Not sure what I should write in the docs. (Tensor and PIL seems still have this difference on tuple)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought PIL and Tensor could support the same types. Where is actually the problem with tuple if input is Tensor ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very sure, let me test for it a bit. I wrote the code with only list in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, torch.tensor() can also convert tuples. There is no functional mismatch between tensor and PIL.

@voldemortX
Copy link
Contributor Author

@vfdev-5 This version should cover things, I rigged some new unit tests. But, why the macos tests always fail...

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 30, 2020

@voldemortX macosx tests failure is unrelated. However, I agree it is annoying to see the CI failing. Probably, it will be fixed in the coming days.
As for tests, as we modify torchvision/transforms/transforms.py in this PR, could you please add few tests to torchvision/test/test_transforms_tensor.py too to test modifed classes from transforms.py.

(85, (10, -10), 0.7, [0.0, 0.0], [1, ], 1)

the way you passing tensor and pil config is OK, but I'd pass either [1, ] or 1 and in the code adapt for PIL or Tensor respectively the value...

@voldemortX
Copy link
Contributor Author

voldemortX commented Dec 1, 2020

Thanks for the review! I added the tests and really found some bugs and rectified them, specifically more compact support for tuples/lists and a check for num_channels like in PIL the old codes check for num_bands. Only for tensors, [x, ] also works same as single number due to it can be broadcasted to match the image.

I also find that in transforms.py we casted everything to list[float], so torchscript should also support every input types here. So I changed the docs a bit there.

p.s. To cast test fill values for tensor and PIL images respectively in the code and not in the configs is possible for test_functional_tensor.py, do you mean I should do a conversion like f_pil = int(f[0]) if len(f) == 1 else f ?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 1, 2020

@voldemortX thanks for the update !

really found some bugs and rectified them, specifically more compact support for tuples/lists and a check for num_channels like in PIL the old codes check for num_bands.

These bugs are in transforms.py only ?

do you mean I should do a conversion like f_pil = int(f[0]) if len(f) == 1 else f ?

Yes, something like that can work. Or like that:

test_config = (85, (10, -10), 0.7, [0.0, 0.0], 1),

f_t = [f, ] if isinstance(f, (int, float)) else f

@voldemortX
Copy link
Contributor Author

voldemortX commented Dec 1, 2020

These bugs are in transforms.py only ?

I directly ran (85, (10, -10), 0.7, [0.0, 0.0], 1) and found some bugs. All changes in the last commit excluding the tests are addressing bugs. One outside transforms.py is the num_channels check. But they should be alright now.

EDIT: I'd like to go with f_pil in the tests, since tensor fills have more freedom. e.g. PIL only accept int/tuple(int)/list[int]... for int images. Seems that is the usual behavior of PIL.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 1, 2020

@voldemortX thanks a lot for the update ! Looks good now.
Let me do another pass a bit later and probably it will be good to merge...

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voldemortX few more nits and I think we are good for this PR. Thanks for working on this PR !

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @voldemortX !

@vfdev-5 vfdev-5 merged commit 21deb4d into pytorch:master Dec 2, 2020
@voldemortX voldemortX deleted the issue2887 branch December 2, 2020 11:01
vfdev-5 added a commit to Quansight/vision that referenced this pull request Dec 4, 2020
* Fill color support for tensor affine transforms

* PEP fix

* Docstring changes and float support

* Docstring update for transforms and float type cast

* Cast only for Tensor

* Temporary patch for lack of Union type support, plus an extra unit test

* More plausible bilinear filling for tensors

* Keep things simple & New docstrings

* Fix lint and other issues after merge

* make it in one line

* Docstring and some code modifications

* More tests and corresponding changes for transoforms and docstring changes

* Simplify test configs

* Update test_functional_tensor.py

* Update test_functional_tensor.py

* Move assertions

Co-authored-by: vfdev <[email protected]>
facebook-github-bot pushed a commit that referenced this pull request Dec 8, 2020
Summary:
* Fill color support for tensor affine transforms

* PEP fix

* Docstring changes and float support

* Docstring update for transforms and float type cast

* Cast only for Tensor

* Temporary patch for lack of Union type support, plus an extra unit test

* More plausible bilinear filling for tensors

* Keep things simple & New docstrings

* Fix lint and other issues after merge

* make it in one line

* Docstring and some code modifications

* More tests and corresponding changes for transoforms and docstring changes

* Simplify test configs

* Update test_functional_tensor.py

* Update test_functional_tensor.py

* Move assertions

Reviewed By: datumbox

Differential Revision: D25396712

fbshipit-source-id: 7eb32024c91b67ffa154a481aa592c6e57b3c480

Co-authored-by: vfdev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants