[prototype] Speed up `autocontrast_image_tensor` #6935

datumbox · 2022-11-09T15:53:04Z

Related to #6818

A performance improvement for uint8 images:

[-------------------------- autocontrast_image_tensor cpu torch.float32 --------------------------]
                         |  autocontrast_image_tensor old  |      fn2 new       |      fn3 new     
1 threads: ----------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |         14050 (+-359) us        |  14020 (+-225) us  |  14085 (+-297) us
      (3, 400, 400)      |          528 (+-  1) us         |   529 (+-  1) us   |   533 (+-  1) us 
6 threads: ----------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |         14855 (+-242) us        |  14814 (+- 60) us  |  14752 (+-447) us
      (3, 400, 400)      |          745 (+-  5) us         |   747 (+- 20) us   |   752 (+- 13) us 

Times are in microseconds (us).

[------------------------ autocontrast_image_tensor cuda torch.float32 -----------------------]
                         |  autocontrast_image_tensor old  |     fn2 new      |     fn3 new    
1 threads: ------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |          224 (+-  0) us         |  223 (+-  0) us  |  223 (+-  0) us
      (3, 400, 400)      |           97 (+-  0) us         |   82 (+-  0) us  |   87 (+-  1) us
6 threads: ------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |          224 (+-  3) us         |  224 (+-  2) us  |  224 (+-  1) us
      (3, 400, 400)      |           97 (+-  2) us         |   82 (+-  1) us  |   87 (+-  2) us

Times are in microseconds (us).

[--------------------------- autocontrast_image_tensor cpu torch.uint8 ---------------------------]
                         |  autocontrast_image_tensor old  |      fn2 new       |      fn3 new     
1 threads: ----------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |         20519 (+-200) us        |  14527 (+- 50) us  |  17883 (+- 85) us
      (3, 400, 400)      |         1029 (+-  7) us         |   828 (+-  6) us   |  1025 (+-  8) us 
6 threads: ----------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |         21208 (+-394) us        |  15201 (+-328) us  |  18550 (+-484) us
      (3, 400, 400)      |         1336 (+- 29) us         |  1131 (+- 27) us   |  1313 (+- 50) us 

Times are in microseconds (us).

[------------------------- autocontrast_image_tensor cuda torch.uint8 ------------------------]
                         |  autocontrast_image_tensor old  |     fn2 new      |     fn3 new    
1 threads: ------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |          236 (+-  0) us         |  275 (+-  0) us  |  231 (+-  0) us
      (3, 400, 400)      |          123 (+-  1) us         |   98 (+-  1) us  |  106 (+-  1) us
6 threads: ------------------------------------------------------------------------------------
      (16, 3, 400, 400)  |          235 (+-  2) us         |  273 (+-  1) us  |  231 (+-  3) us
      (3, 400, 400)      |          123 (+-  2) us         |   98 (+-  1) us  |  107 (+-  2) us

Times are in microseconds (us).

fn2 is the submitted variant. It is 30% faster on CPU but 15% slower on GPU.

fn3 is another candidate (not included on this PR). It's 13% faster on CPU and about the same on GPU. Here is the implementation:

def fn3(image: torch.Tensor) -> torch.Tensor:
    c = image.shape[-3]
    if c not in [1, 3]:
        raise TypeError(f"Input image tensor permitted channel values are {[1, 3]}, but found {c}")

    if image.numel() == 0:
        # exit earlier on empty images
        return image

    bound = _FT._max_value(image.dtype)
    fp = image.is_floating_point()
    dtype = image.dtype if fp else torch.float32

    minimum = image.amin(dim=(-2, -1), keepdim=True)
    maximum = image.amax(dim=(-2, -1), keepdim=True)
    eq_idxs = maximum == minimum
    if not fp:
        maximum = maximum.to(dtype)

    inv_scale = maximum.sub_(minimum).div_(bound)
    minimum[eq_idxs] = 0.0
    inv_scale[eq_idxs] = 1.0

    output = torch.empty_like(image, dtype=dtype)
    torch.sub(image, minimum, out=output)
    return output.div_(inv_scale).clamp_(0, bound).to(image.dtype)

In offline discussions with @pmeier and @vfdev-5 we decided to choose fn2 because it optimizes for the most common use-case which is doing the process on CPU. Moreover the fn3 variant uses the out= idiom which doesn't play nice with autograd.

cc @vfdev-5 @bjuncek @pmeier

pmeier

LGTM if CI is green (with the caveat of #6934). Thanks Vasilis!

datumbox · 2022-11-09T16:22:50Z

@pmeier I get failures with:

Mismatched elements: 987 / 2772 (35.6%)
Greatest absolute difference: 5.960464477539063e-08 at index (0, 0, 0, 2)
Greatest relative difference: 1.1880246168204803e-07 at index (3, 1, 6, 11)

I thought we were using higher diffs. Do you think this threshold is reasonable?

pmeier · 2022-11-09T16:26:50Z

The failure happens in the consistency tests which test for equality by default

vision/test/test_prototype_transforms_consistency.py

Line 54 in 10d47a6

self.closeness_kwargs = closeness_kwargs or dict(rtol=0, atol=0)

You can add this

vision/test/test_prototype_transforms_consistency.py

Lines 167 to 168 in 10d47a6

    
           # Use default tolerances of `torch.testing.assert_close` 
        
           closeness_kwargs=dict(rtol=None, atol=None),

to

vision/test/test_prototype_transforms_consistency.py

Lines 247 to 254 in 10d47a6

    
           ConsistencyConfig( 
        
               prototype_transforms.RandomAutocontrast, 
        
               legacy_transforms.RandomAutocontrast, 
        
               [ 
        
                   ArgsKwargs(p=0), 
        
                   ArgsKwargs(p=1), 
        
               ], 
        
           ),

for reasonable default tolerances. We needed to this for a few ops where we changed algorithms and in turn got slight deviations from v1.

datumbox · 2022-11-09T16:36:39Z

@pmeier Thanks for the advice. Worked locally. I'll rerun the tests to be sure.

datumbox · 2022-11-09T17:29:38Z

The failing test is the false positive that @vfdev-5 is currently investigating (see #6933 (comment))

Summary: * Performance optimization for autocontrast * Fixing tests Reviewed By: NicolasHug Differential Revision: D41265202 fbshipit-source-id: cd1f9f777ecf56168def256a2ef04335a602684b

Performance optimization for autocontrast

01cb24d

datumbox added module: transforms Perf For performance improvements prototype labels Nov 9, 2022

datumbox requested review from vfdev-5 and pmeier November 9, 2022 15:53

facebook-github-bot added the cla signed label Nov 9, 2022

datumbox mentioned this pull request Nov 9, 2022

Performance improvements for transforms v2 vs. v1 #6818

Closed

31 tasks

pmeier approved these changes Nov 9, 2022

View reviewed changes

Fixing tests

bfc9365

Merge branch 'main' into perf/autocontrast

8a18246

datumbox merged commit ffd5a56 into pytorch:main Nov 9, 2022

datumbox deleted the perf/autocontrast branch November 9, 2022 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[prototype] Speed up `autocontrast_image_tensor` #6935

[prototype] Speed up `autocontrast_image_tensor` #6935

Uh oh!

datumbox commented Nov 9, 2022 •

edited by pytorch-bot bot

Loading

Uh oh!

pmeier left a comment

Uh oh!

datumbox commented Nov 9, 2022 •

edited

Loading

Uh oh!

pmeier commented Nov 9, 2022

Uh oh!

datumbox commented Nov 9, 2022

Uh oh!

datumbox commented Nov 9, 2022 •

edited

Loading

Uh oh!

Uh oh!

[prototype] Speed up autocontrast_image_tensor #6935

[prototype] Speed up autocontrast_image_tensor #6935

Uh oh!

Conversation

datumbox commented Nov 9, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

datumbox commented Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Nov 9, 2022

Uh oh!

datumbox commented Nov 9, 2022

Uh oh!

datumbox commented Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

[prototype] Speed up `autocontrast_image_tensor` #6935

[prototype] Speed up `autocontrast_image_tensor` #6935

datumbox commented Nov 9, 2022 •

edited by pytorch-bot bot

Loading

datumbox commented Nov 9, 2022 •

edited

Loading

datumbox commented Nov 9, 2022 •

edited

Loading