-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add adjustment operations for RGB Tensor Images. #1525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Right now, we have operations on PIL images, but we want to have a version of the opeartions that act directly on Tensor images. Here, we add such operations for adjust_brightness, adjust_contrast and adjust_saturation. In PIL, those functions are implemented by generating an degenerate image from the first, and then interpolating them together. - https://github.com/python-pillow/Pillow/blob/master/src/PIL/ImageEnhance.py - https://github.com/python-pillow/Pillow/blob/master/src/libImaging/Blend.c A few caveats: * Since PIL operates on uint8, and the tensor operations might be on float, we can get slightly different values because of int truncation. * We assume here the images are RGB; in particular, to handle an alpha channel, we need to check whether it is present, in which case we copy it to the final image.
Codecov Report
@@ Coverage Diff @@
## master #1525 +/- ##
==========================================
+ Coverage 63.86% 63.87% +<.01%
==========================================
Files 83 83
Lines 6507 6525 +18
Branches 1005 1008 +3
==========================================
+ Hits 4156 4168 +12
- Misses 2057 2060 +3
- Partials 294 297 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks a lot for the PR @pedrofreire !
I made a few comments, they are mostly minor (speed / memory optimizations), and I'd like to have your thoughts about keeping the same dtype
as the input tensor.
test/test_functional_tensor.py
Outdated
l1_diff = (ft_img - f_img).norm(p=1) | ||
self.assertLess(l1_diff, 0.01 * img.nelement()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about instead of computing the l1 norm of the difference, we instead take the max of the absolute value?
This gives a metric which is invariant to the number of elements, and we have a clearer idea on what's the error due to the uint8
rounding that we have.
So something like
# find a reasonably good threshold
self.assertLess((ft_img - f_img).abs().max(), 0.001)
Plus, I think that if we make the changes I mentioned about keeping the return type of the inputs, we could have 0 error in this function I believe. In this case, we could probably just keep torch.equal
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable. From running the tests thousands of times, the highest error I got was 4 / 255; it seems theoretically possible to have a truncation error up 5 / 255 (it seems we have 5 multiplications followed by truncations), so I set that as a bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I made the returned tensor to have the same dtype, but as it is we still have truncations, since internal additions still happen with floats. The simplest way to resolve this seems to be to do a tensor multiplication per element, convert the dtype, then sum-reduce the tensor.
That makes the code a bit less clear though, and since keeping the truncation doesn't seem to be very important, I kept as it is.
If you think it is worth to have truncation for uint8, I can still add it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but I'd love to understand the truncations you are mentioning. From the Pillow implementation of Blend, there is some casting to float happening, but maybe this is not the truncation that you mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression was that the grayscale conversion was doing 3 truncations and I assumed the interpolation was doing 2 - looking at the code, the interpolation only does 1 - which would explain why I only saw errors up to 4 / 255 in float (and up to 2 / 255 in uint8) :)
- We make our operations have input.dtype == output.dtype, at the cost of adding a few type checks and branches. - By using Tensor broadcast, we can simplify the calls to _blend.
It seems Python 2 does not support this type of unpacking, so it broke Python 2 builds. This should fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job, thanks a lot!
Right now, we have operations on PIL images, but we want to have a version of the operations that act directly on Tensor images, as discussed in issue #1375 .
Here, we add such operations for adjust_brightness, adjust_contrast and adjust_saturation.
In PIL, those functions are implemented by generating a degenerate image from the first, and then interpolating them together.
A few caveats: