Skip to content

Add adjustment operations for RGB Tensor Images. #1525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 26, 2019

Conversation

pedrofreire
Copy link
Contributor

Right now, we have operations on PIL images, but we want to have a version of the operations that act directly on Tensor images, as discussed in issue #1375 .

Here, we add such operations for adjust_brightness, adjust_contrast and adjust_saturation.

In PIL, those functions are implemented by generating a degenerate image from the first, and then interpolating them together.

A few caveats:

  • Since PIL operates on uint8, and the tensor operations might be on float, we can get slightly different values because of int truncation; in particular, in testing, we check that the images are equal up to a small difference.
  • We assume here that the images are RGB; in particular, to handle an alpha channel, we would need to check whether the image has an alpha, in which case we copy it in the degenerate image.
  • adjust_contrast and adjust_saturation rely on grayscale images; since the pull request [WIP] Add Scriptable Transform: Grayscale #1505 is working on it, I made a temporary function, _rgb_to_grayscale, to be replaced once [WIP] Add Scriptable Transform: Grayscale #1505 is done.

Right now, we have operations on PIL images, but we want to have a version of the opeartions that act directly on Tensor images.

Here, we add such operations for adjust_brightness, adjust_contrast and adjust_saturation.

In PIL, those functions are implemented by generating an degenerate image from the first, and then interpolating them together.
- https://github.com/python-pillow/Pillow/blob/master/src/PIL/ImageEnhance.py
- https://github.com/python-pillow/Pillow/blob/master/src/libImaging/Blend.c

A few caveats:
* Since PIL operates on uint8, and the tensor operations might be on float, we can get slightly different values because of int truncation.
* We assume here the images are RGB; in particular, to handle an alpha channel, we need to check whether it is present, in which case we copy it to the final image.
@codecov-io
Copy link

codecov-io commented Oct 25, 2019

Codecov Report

Merging #1525 into master will increase coverage by <.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1525      +/-   ##
==========================================
+ Coverage   63.86%   63.87%   +<.01%     
==========================================
  Files          83       83              
  Lines        6507     6525      +18     
  Branches     1005     1008       +3     
==========================================
+ Hits         4156     4168      +12     
- Misses       2057     2060       +3     
- Partials      294      297       +3
Impacted Files Coverage Δ
torchvision/transforms/functional_tensor.py 62.5% <66.66%> (+5.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aadbed6...6eb0b60. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks a lot for the PR @pedrofreire !

I made a few comments, they are mostly minor (speed / memory optimizations), and I'd like to have your thoughts about keeping the same dtype as the input tensor.

Comment on lines 59 to 60
l1_diff = (ft_img - f_img).norm(p=1)
self.assertLess(l1_diff, 0.01 * img.nelement())
Copy link
Member

@fmassa fmassa Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about instead of computing the l1 norm of the difference, we instead take the max of the absolute value?
This gives a metric which is invariant to the number of elements, and we have a clearer idea on what's the error due to the uint8 rounding that we have.
So something like

# find a reasonably good threshold
self.assertLess((ft_img - f_img).abs().max(), 0.001)

Plus, I think that if we make the changes I mentioned about keeping the return type of the inputs, we could have 0 error in this function I believe. In this case, we could probably just keep torch.equal there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable. From running the tests thousands of times, the highest error I got was 4 / 255; it seems theoretically possible to have a truncation error up 5 / 255 (it seems we have 5 multiplications followed by truncations), so I set that as a bound.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I made the returned tensor to have the same dtype, but as it is we still have truncations, since internal additions still happen with floats. The simplest way to resolve this seems to be to do a tensor multiplication per element, convert the dtype, then sum-reduce the tensor.
That makes the code a bit less clear though, and since keeping the truncation doesn't seem to be very important, I kept as it is.
If you think it is worth to have truncation for uint8, I can still add it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I'd love to understand the truncations you are mentioning. From the Pillow implementation of Blend, there is some casting to float happening, but maybe this is not the truncation that you mention.

Copy link
Contributor Author

@pedrofreire pedrofreire Oct 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression was that the grayscale conversion was doing 3 truncations and I assumed the interpolation was doing 2 - looking at the code, the interpolation only does 1 - which would explain why I only saw errors up to 4 / 255 in float (and up to 2 / 255 in uint8) :)

Pedro Freire added 4 commits October 25, 2019 17:24
- We make our operations have input.dtype == output.dtype, at the cost of
adding a few type checks and branches.

- By using Tensor broadcast, we can simplify the calls to _blend.
It seems Python 2 does not support this type of unpacking, so it broke
Python 2 builds. This should fix it.
@pedrofreire pedrofreire requested a review from fmassa October 25, 2019 19:41
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job, thanks a lot!

@fmassa fmassa merged commit e79cadd into pytorch:master Oct 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants