[prototype] Speed up `adjust_contrast_image_tensor` #6933

datumbox · 2022-11-09T11:00:07Z

Related to #6818

Small performance improvement for uint8 images by avoiding the double casting from floats to ints and back to floats.

Floats remain unaffected, a 5% boost on uint8s:

[----------- adjust_contrast_image_tensor cpu torch.float32 -----------]
                         |  adjust_contrast_image_tensor old  |  fn2 new
1 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |               13600                |   13500 
      (3, 400, 400)      |                 538                |     535 
6 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |               14300                |   14300 
      (3, 400, 400)      |                 838                |     834 

Times are in microseconds (us).

[---------- adjust_contrast_image_tensor cuda torch.float32 -----------]
                         |  adjust_contrast_image_tensor old  |  fn2 new
1 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |                196                 |    196  
      (3, 400, 400)      |                 64                 |     62  
6 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |                197                 |    197  
      (3, 400, 400)      |                 64                 |     62  

Times are in microseconds (us).

[------------ adjust_contrast_image_tensor cpu torch.uint8 ------------]
                         |  adjust_contrast_image_tensor old  |  fn2 new
1 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |               26100                |   25000 
      (3, 400, 400)      |                1109                |     999 
6 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |               28300                |   30000 
      (3, 400, 400)      |                1700                |    1550 

Times are in microseconds (us).

[----------- adjust_contrast_image_tensor cuda torch.uint8 ------------]
                         |  adjust_contrast_image_tensor old  |  fn2 new
1 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |               257.3                |    242  
      (3, 400, 400)      |                89.7                |     79  
6 threads: -------------------------------------------------------------
      (16, 3, 400, 400)  |               258.1                |    242  
      (3, 400, 400)      |                90.2                |     80  

Times are in microseconds (us).

cc @vfdev-5 @bjuncek @pmeier

datumbox · 2022-11-09T11:02:44Z

torchvision/prototype/transforms/functional/_color.py

+    if c == 3:
+        grayscale_image = _rgb_to_gray(image, cast=False)
+        if not fp:
+            grayscale_image = grayscale_image.floor_()


@pmeier In an early iteration of the PR, I have missed this floor_() call which is necessary to reproduce identical results with stable. Unfortunately the tests passed without catching the issue. Might be worth checking that our reference tests check both on floats and ints.

This is properly tested with

vision/test/prototype_transforms_kernel_infos.py

Lines 1845 to 1850 in f32600b

def reference_inputs_adjust_contrast_image_tensor():

for image_loader, contrast_factor in itertools.product(

make_image_loaders(color_spaces=(features.ColorSpace.GRAY, features.ColorSpace.RGB), extra_dims=[()]),

_ADJUST_CONTRAST_FACTORS,

):

yield ArgsKwargs(image_loader, contrast_factor=contrast_factor)

but it seems our tolerances are to high to pick up on this

vision/test/prototype_transforms_kernel_infos.py

Line 1866 in f32600b

closeness_kwargs=DEFAULT_PIL_REFERENCE_CLOSENESS_KWARGS,

vision/test/prototype_transforms_kernel_infos.py

Lines 64 to 67 in f32600b

DEFAULT_PIL_REFERENCE_CLOSENESS_KWARGS = {

(("TestKernels", "test_against_reference"), torch.float32, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"),

(("TestKernels", "test_against_reference"), torch.uint8, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"),

}

However, we can't remove the tolerances completely here, since the output differs even with the current implementation.

Hmm... 99% of the performance optimizations were tested for training and we know they are correct. So there might be rounding error issues. I think we should adjust the tolerances to the degree that will catch BC issues but not go ballistic for the 8th decimal. That might be a hard exercise though.

vfdev-5

Looks good, thanks @datumbox

datumbox · 2022-11-09T12:01:29Z

@vfdev-5 thanks. It seems there is an unrelated flaky test about resizing bboxes:
https://github.com/pytorch/vision/actions/runs/3427555648/jobs/5710704975

Could you have a look?

FAILED test/test_prototype_transforms_functional.py::TestKernels::test_against_reference[resize_bounding_box-55] - AssertionError: Tensor-likes are not equal!

Mismatched elements: 2 / 16 (12.5%)
Greatest absolute difference: 1 at index (2, 2)
Greatest relative difference: 0.25 at index (2, 3)

The failure occurred for item [0]
FAILED test/test_prototype_transforms_functional.py::TestKernels::test_against_reference[resize_bounding_box-56] - AssertionError: Tensor-likes are not equal!

Mismatched elements: 2 / 16 (12.5%)
Greatest absolute difference: 1 at index (2, 2)
Greatest relative difference: 0.25 at index (2, 3)

The failure occurred for item [0]
FAILED test/test_prototype_transforms_functional.py::TestKernels::test_against_reference[resize_bounding_box-57] - AssertionError: Tensor-likes are not equal!

Mismatched elements: 2 / 16 (12.5%)
Greatest absolute difference: 1 at index (2, 2)
Greatest relative difference: 0.25 at index (2, 3)

The failure occurred for item [0]

Summary: * Avoid double casting on adjust_contrast * Handle properly ints. Reviewed By: NicolasHug Differential Revision: D41265198 fbshipit-source-id: 4bacdc743b3cdf55a43e0ca6185b9c9b2ab12160

datumbox added 2 commits November 9, 2022 10:41

Avoid double casting on adjust_contrast

8976cf4

Handle properly ints.

01fd434

datumbox added module: transforms Perf For performance improvements prototype labels Nov 9, 2022

datumbox requested review from vfdev-5 and pmeier November 9, 2022 11:00

facebook-github-bot added the cla signed label Nov 9, 2022

datumbox mentioned this pull request Nov 9, 2022

Performance improvements for transforms v2 vs. v1 #6818

Closed

31 tasks

datumbox commented Nov 9, 2022

View reviewed changes

vfdev-5 reviewed Nov 9, 2022

View reviewed changes

vfdev-5 approved these changes Nov 9, 2022

View reviewed changes

datumbox merged commit 10d47a6 into pytorch:main Nov 9, 2022

datumbox deleted the perf/contrast branch November 9, 2022 16:04

This was referenced Nov 9, 2022

[prototype] Speed up autocontrast_image_tensor #6935

Merged

[prototype] Minor change on adjust_saturation_image_tensor uint8 #6940

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prototype] Speed up `adjust_contrast_image_tensor` #6933

[prototype] Speed up `adjust_contrast_image_tensor` #6933

datumbox commented Nov 9, 2022 •

edited

Loading

datumbox Nov 9, 2022

pmeier Nov 9, 2022

datumbox Nov 9, 2022

vfdev-5 left a comment

datumbox commented Nov 9, 2022

	def reference_inputs_adjust_contrast_image_tensor():
	for image_loader, contrast_factor in itertools.product(
	make_image_loaders(color_spaces=(features.ColorSpace.GRAY, features.ColorSpace.RGB), extra_dims=[()]),
	_ADJUST_CONTRAST_FACTORS,
	):
	yield ArgsKwargs(image_loader, contrast_factor=contrast_factor)

	DEFAULT_PIL_REFERENCE_CLOSENESS_KWARGS = {
	(("TestKernels", "test_against_reference"), torch.float32, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"),
	(("TestKernels", "test_against_reference"), torch.uint8, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"),
	}

[prototype] Speed up adjust_contrast_image_tensor #6933

[prototype] Speed up adjust_contrast_image_tensor #6933

Conversation

datumbox commented Nov 9, 2022 • edited Loading

datumbox Nov 9, 2022

Choose a reason for hiding this comment

pmeier Nov 9, 2022

Choose a reason for hiding this comment

datumbox Nov 9, 2022

Choose a reason for hiding this comment

vfdev-5 left a comment

Choose a reason for hiding this comment

datumbox commented Nov 9, 2022

[prototype] Speed up `adjust_contrast_image_tensor` #6933

[prototype] Speed up `adjust_contrast_image_tensor` #6933

datumbox commented Nov 9, 2022 •

edited

Loading