-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Fill color support for tensor affine transforms #2904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR @voldemortX !
I left few comments for minor improvements. We have to update also the docstring for F.affine
and others.
@voldemortX this is a good question and you are right, those tests do not pass as interpolation results are different between PIL and torch. So, for instance, we can skip the tests for |
Actually I have another question needs clearing: If all we are considering is linear operations (Euclidean transforms and linear interpolations), the dummy mask should always be 0/1 after affine ops? Although even if that is the case I'd still prefer sticking with 0.5 thresholding. |
Actually, no if import torch
from torchvision.transforms.functional import affine
mask = torch.ones(1, 8, 8, dtype=torch.float32)
res = affine(mask, angle=45, translate=[0.1, 0.0], scale=1.0, shear=[0., 0.], resample=2)
res[0, :, :]
>
tensor([[0.0000, 0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 0.3281, 0.0000],
[0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.3281],
[0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
[0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
[0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.3281],
[0.0000, 0.1866, 0.8938, 1.0000, 1.0000, 1.0000, 0.3281, 0.0000]]) |
Well in that case, if we replace a pixel's value with the fillcolor directly in bilinear resample mode (like what just implemented), it would be a different behavior than this? Maybe we need |
Codecov Report
@@ Coverage Diff @@
## master #2904 +/- ##
==========================================
- Coverage 73.41% 73.32% -0.09%
==========================================
Files 99 99
Lines 8801 8840 +39
Branches 1389 1397 +8
==========================================
+ Hits 6461 6482 +21
- Misses 1915 1930 +15
- Partials 425 428 +3
Continue to review full report at Codecov.
|
@voldemortX I see what you mean about |
One more thought that needs checking: affine(), rotate() and perspective() all have no inter-channel operations, so they could all support a n-tuple for fill color (other than a single float and int value). I'm not so sure since I find the docstring for If this is correct, I'll try and support n-tuple as well. Btw, do I need to change base class to test for float images, e.g. a new function _create_float_data() or a new metric for comparing float outputs. |
@voldemortX i was also thinking about supporting tuple of values. Technically, it is not that complicated, however for instance, let's accomplish this feature with single int or float and work on tuple in another PR.
Currently, we are doing simply cast to float like here : vision/test/test_functional_tensor.py Line 590 in 1fe1e11
so no need to a new function. |
Great, I'll commit for single value soon. |
Hi @voldemortX! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
@voldemortX I checked a bit more and yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @voldemortX !
I left few comments to update it a little more.
@voldemortX let's try to keep it simple. In addition if we would like to support list of int, float, let's do it now like that: from typing import List
import torch
class A(torch.nn.Module):
def __init__(self, fill=0):
super().__init__()
# fill can be int, float or list
# for torchscript: single value should be in the list: [value, ]
self.fill = fill
def forward(self, x):
fill = self.fill
if isinstance(x, torch.Tensor) and isinstance(fill, (int, float)):
fill = [float(fill), ]
return func(x, fill)
def func(x: torch.Tensor, fill: List[float]):
if not isinstance(x, torch.Tensor):
print("Call Pillow with fill: {}".format(fill))
return
# dummy tensor func implementation :
y = x.clone()
mask = x < 0.5 * x.max()
y[mask] = torch.tensor(fill, dtype=x.dtype)
return y For PIL, it should not change anything. Type hint for Btw, I'll trying to land the following PR (#2952) with uniformized arg names, so we will have to take that into account. |
Should we just "revert" to the previous commit or implement for List in this PR also? |
@voldemortX I'd say implement for List in this PR. |
Okay! I'll do it soon with some new unit tests. |
@vfdev-5 The documentation is n-tuple, should we do Tuple or List? |
@voldemortX we wont be able to support single value with Tuple I think. Let's make it as List. |
Should the doc be changed then? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voldemortX thanks a lot for the update !
I left a nit comment. Otherwise looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voldemortX I have few other comments on testing and docstrings.
As we modify torchvision/transforms/transforms.py here, we also have to add few tests to torchvision/test/test_transforms_tensor.py
Thanks for working on this PR and appologies about how much time it take to make the merge.
"bands of the image ({} != {})") | ||
raise ValueError(msg.format(len(fill), num_bands)) | ||
else: | ||
fill = tuple(fill) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voldemortX seems like I missed this modification in the previous review. Why do we need to modify the code here ? It's a PIL side and I think it can be kept as is ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't quite remember changing that actually... I'll get it back to what it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I remember now. I need the tuple(fill) conversion since now the input is formatted as List, also that means I can't do tests with tuple fill inputs. @vfdev-5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You are right ! Pillow accepts only tuples and we can give a list too. Maybe, we can make the check more straightforward :
if isinstance(fill, (list, tuple)):
if len(fill) != num_bands:
msg = ("The number of elements in 'fill' does not match the number of "
"bands of the image ({} != {})")
raise ValueError(msg.format(len(fill), num_bands))
fill = tuple(fill)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I'll do that together in the next commit.
mask = mask.expand_as(img) | ||
fill_img = torch.tensor(fill, dtype=img.dtype, device=img.device).view(1, len(fill), 1, 1).expand_as(img) | ||
if mode == 'nearest': | ||
img[mask < 1e-3] = fill_img[mask < 1e-3] # Leave some room for error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question here. Appologies if we've already discussed that. Technically, we have the mask as img dtype and we recreate boolean tensor. And we have to compare to 1 or 0.5, right ?
img[mask < 0.5]
# or
img[mask < 1.0]
why the threshold is 1-e3
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In nearest interpolation, the mask should be all 0/1. So anything between (0-1) should be right? I use this to compensate for possible rounding errors from CUDA or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In nearest the result will be already rounded and we only have to get back bool mask values. I think we can either do img[mask < 0.5]
or mask = mask.bool()
. Let's do:
mask = mask < 0.5
img[mask] = fill_img[mask]
also without computing twice the binary mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems better and faster.
test/test_functional_tensor.py
Outdated
(33, (5, -4), 1.0, [0.0, 0.0], [0, 0, 0]), | ||
(45, [-5, 4], 1.2, [0.0, 0.0], [1, 2, 3]), | ||
(33, (-4, -8), 2.0, [0.0, 0.0], [255, 255, 255]), | ||
(85, (10, -10), 0.7, [0.0, 0.0], None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test here with a single int and float as fill value as [a_int, ]
and (b_float, )
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! I'll add the tests and docstring changes together in the next commit.
test/test_functional_tensor.py
Outdated
0.03, | ||
msg="{}: {}\n{} vs \n{}".format( | ||
(img_size, r, dt, a, e, c), | ||
for f in [None, [0, 0, 0], [1, 2, 3], [255, 255, 255]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, let's add single int and float values (as [a_int, ]
and (b_float, )
)
torchvision/transforms/functional.py
Outdated
@@ -573,10 +573,9 @@ def perspective( | |||
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``. | |||
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` are supported. | |||
For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable. | |||
fill (n-tuple or int or float): Pixel fill value for area outside the rotated | |||
fill (sequence or int or float, optional): Pixel fill value for area outside the rotated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the docstring more explicit about how it works for PIL, tensor and torchscript:
fill (sequence or int or float, optional): Pixel fill value for the area outside the rotated
image. If int or float, the value is used for all bands respectively.
This option is supported for PIL image and Tensor inputs.
In torchscript mode single int/float value is not supported, please use a tuple
or list of length 1: ``[value, ]``.
If input is PIL Image, the options is only available for ``Pillow>=5.0.0``.
same for other docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vfdev-5 I did some sorting. It seems the current version means affine functions support:
int/float/list/None for Tensor.
int/float/list/tuple/None for PIL.
list/None for torchscript.
Not sure what I should write in the docs. (Tensor and PIL seems still have this difference on tuple)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought PIL and Tensor could support the same types. Where is actually the problem with tuple if input is Tensor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very sure, let me test for it a bit. I wrote the code with only list in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, torch.tensor() can also convert tuples. There is no functional mismatch between tensor and PIL.
@vfdev-5 This version should cover things, I rigged some new unit tests. But, why the macos tests always fail... |
@voldemortX macosx tests failure is unrelated. However, I agree it is annoying to see the CI failing. Probably, it will be fixed in the coming days.
the way you passing tensor and pil config is OK, but I'd pass either |
Thanks for the review! I added the tests and really found some bugs and rectified them, specifically more compact support for tuples/lists and a check for num_channels like in PIL the old codes check for num_bands. Only for tensors, [x, ] also works same as single number due to it can be broadcasted to match the image. I also find that in transforms.py we casted everything to list[float], so torchscript should also support every input types here. So I changed the docs a bit there. p.s. To cast test fill values for tensor and PIL images respectively in the code and not in the configs is possible for test_functional_tensor.py, do you mean I should do a conversion like |
@voldemortX thanks for the update !
These bugs are in transforms.py only ?
Yes, something like that can work. Or like that:
|
I directly ran (85, (10, -10), 0.7, [0.0, 0.0], 1) and found some bugs. All changes in the last commit excluding the tests are addressing bugs. One outside transforms.py is the num_channels check. But they should be alright now. EDIT: I'd like to go with f_pil in the tests, since tensor fills have more freedom. e.g. PIL only accept int/tuple(int)/list[int]... for int images. Seems that is the usual behavior of PIL. |
@voldemortX thanks a lot for the update ! Looks good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voldemortX few more nits and I think we are good for this PR. Thanks for working on this PR !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @voldemortX !
* Fill color support for tensor affine transforms * PEP fix * Docstring changes and float support * Docstring update for transforms and float type cast * Cast only for Tensor * Temporary patch for lack of Union type support, plus an extra unit test * More plausible bilinear filling for tensors * Keep things simple & New docstrings * Fix lint and other issues after merge * make it in one line * Docstring and some code modifications * More tests and corresponding changes for transoforms and docstring changes * Simplify test configs * Update test_functional_tensor.py * Update test_functional_tensor.py * Move assertions Co-authored-by: vfdev <[email protected]>
Summary: * Fill color support for tensor affine transforms * PEP fix * Docstring changes and float support * Docstring update for transforms and float type cast * Cast only for Tensor * Temporary patch for lack of Union type support, plus an extra unit test * More plausible bilinear filling for tensors * Keep things simple & New docstrings * Fix lint and other issues after merge * make it in one line * Docstring and some code modifications * More tests and corresponding changes for transoforms and docstring changes * Simplify test configs * Update test_functional_tensor.py * Update test_functional_tensor.py * Move assertions Reviewed By: datumbox Differential Revision: D25396712 fbshipit-source-id: 7eb32024c91b67ffa154a481aa592c6e57b3c480 Co-authored-by: vfdev <[email protected]>
PR for Issue #2887
It seems that affine(), rotate() and perspective() all use _apply_grid_transform(), so I added the code there.
I ran unit tests for all 3 functions and passed, but I can't figure out whether it is right for bilinear. e.g. If I change
vision/test/test_functional_tensor.py
Line 553 in f9e31a6
p.s. I only have CUDA 10.0 at the moment, so I had to block the version check in torchvision/extensions.py, not sure if that caused the above mentioned test failures.