[prototype] Speed up `adjust_hue_image_tensor` #6938

datumbox · 2022-11-09T18:18:02Z

Related to #6818

Small performance improvement by making use of inplace ops where possible:

[-------------- adjust_hue_image_tensor cpu torch.float32 ---------------]
                         |  adjust_hue_image_tensor old  |     fn2 new    
1 threads: ---------------------------------------------------------------
      (16, 3, 400, 400)  |         467 (+- 13) ms        |  450 (+- 12) ms
      (3, 400, 400)      |          14 (+-  0) ms        |   14 (+-  0) ms
6 threads: ---------------------------------------------------------------
      (16, 3, 400, 400)  |         457 (+- 13) ms        |  451 (+-  9) ms
      (3, 400, 400)      |          18 (+-  0) ms        |   17 (+-  0) ms

Times are in milliseconds (ms).

[--------------- adjust_hue_image_tensor cuda torch.float32 --------------]
                         |  adjust_hue_image_tensor old  |      fn2 new    
1 threads: ----------------------------------------------------------------
      (16, 3, 400, 400)  |        2260 (+-  0) us        |  2200 (+-  1) us
      (3, 400, 400)      |         571 (+-  2) us        |   526 (+-  1) us
6 threads: ----------------------------------------------------------------
      (16, 3, 400, 400)  |        2261 (+- 10) us        |  2200 (+- 15) us
      (3, 400, 400)      |         570 (+- 19) us        |   525 (+-  4) us

Times are in microseconds (us).

[--------------- adjust_hue_image_tensor cpu torch.uint8 ----------------]
                         |  adjust_hue_image_tensor old  |     fn2 new    
1 threads: ---------------------------------------------------------------
      (16, 3, 400, 400)  |         483 (+- 20) ms        |  461 (+- 13) ms
      (3, 400, 400)      |          15 (+-  0) ms        |   15 (+-  0) ms
6 threads: ---------------------------------------------------------------
      (16, 3, 400, 400)  |         487 (+- 15) ms        |  479 (+- 17) ms
      (3, 400, 400)      |          19 (+-  0) ms        |   18 (+-  1) ms

Times are in milliseconds (ms).

[---------------- adjust_hue_image_tensor cuda torch.uint8 ---------------]
                         |  adjust_hue_image_tensor old  |      fn2 new    
1 threads: ----------------------------------------------------------------
      (16, 3, 400, 400)  |        2433 (+-  1) us        |  2365 (+-  0) us
      (3, 400, 400)      |         623 (+-  1) us        |   580 (+-  2) us
6 threads: ----------------------------------------------------------------
      (16, 3, 400, 400)  |        2436 (+-  6) us        |  2366 (+-  0) us
      (3, 400, 400)      |         622 (+- 21) us        |   581 (+-  2) us

Times are in microseconds (us).

cc @vfdev-5 @bjuncek @pmeier

datumbox

Some clarification comments. Frankly I was expecting a more significant speed improvement given the amount of inplace ops that we could do here. The reason we don't see this is because adjust_hue is very computation heavy and one of our slowest transforms. These optimizations clip a few ms but those account for a small fraction of the total time.

datumbox · 2022-11-10T09:02:45Z

torchvision/prototype/transforms/functional/_color.py

+    hg = rc.add(2.0).sub_(bc).mul_(mask_maxc_eq_g & mask_maxc_neq_r)
+    hr = bc.sub_(gc).mul_(~mask_maxc_neq_r)
+    hb = gc.add_(4.0).sub_(rc).mul_(mask_maxc_neq_r.logical_and_(mask_maxc_eq_g.logical_not_()))


Changing the order of operations allows us to do more in-place ops.

In particular once hg is estimated, we can do an in-place on bc during the hr estimation. Then in the estimation of hb we can do inplace on gc and the logical masks.

datumbox · 2022-11-10T09:04:42Z

torchvision/prototype/transforms/functional/_color.py

+    sxf = s * f
+    one_minus_s = 1.0 - s
+    q = (1.0 - sxf).mul_(v).clamp_(0.0, 1.0)
+    t = sxf.add_(one_minus_s).mul_(v).clamp_(0.0, 1.0)
+    p = one_minus_s.mul_(v).clamp_(0.0, 1.0)


Again we reorder to be able to do more in-place ops. We also expand the math ops to reuse components.

We precompute s*f which is used in q and t estimation. We also do that for 1-s. Then we estimate first q, so that we canl ater modify sxf inplace on the t estimation. Finally one_minus_s can be in-place modified in the estimation of p.

datumbox · 2022-11-10T09:05:22Z

torchvision/prototype/transforms/functional/_color.py

@@ -234,7 +235,7 @@ def _hsv_to_rgb(img: torch.Tensor) -> torch.Tensor:
    a3 = torch.stack((p, p, t, v, v, q), dim=-3)
    a4 = torch.stack((a1, a2, a3), dim=-4)

-    return (a4.mul_(mask.to(dtype=img.dtype).unsqueeze(dim=-4))).sum(dim=-3)
+    return (a4.mul_(mask.unsqueeze(dim=-4))).sum(dim=-3)


Unnecessary casting of mask. It's possible to do a multiplication with bools as we did previously for mask_maxc_eq_g etc.

datumbox · 2022-11-10T09:06:16Z

torchvision/prototype/transforms/functional/_color.py

+    h6 = h.mul(6)
    i = torch.floor(h6)
-    f = h6 - i
+    f = h6.sub_(i)


Minor optimizations based on @pmeier's finding that mul is preferable to * for numbers. Also h6 can be modified in-place as it's not reused.

As discussed offline, we only get benefits if we can eliminate a tensor division, which is not the case here.

datumbox · 2022-11-10T09:07:01Z

torchvision/prototype/transforms/functional/_type_conversion.py

@@ -20,6 +19,8 @@ def decode_image_with_pil(encoded_image: torch.Tensor) -> features.Image:

 @torch.jit.unused
 def decode_video_with_av(encoded_video: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, Dict[str, Any]]:
+    import unittest.mock


Just a drive by change to avoid the hard dependency on unittest. @pmeier said offline that we can clean up many methods that are no longer used. He is going to do this on a separate PR.

Will send a PR soon.

datumbox · 2022-11-10T09:07:43Z

torchvision/prototype/transforms/functional/_meta.py

@@ -164,6 +164,7 @@ def convert_format_bounding_box(
    if new_format == old_format:
        return bounding_box

+    # TODO: Add _xywh_to_cxcywh and _cxcywh_to_xywh to improve performance


Not the highest priority as we don't do such conversions internally but it might be good to offer those 2 and stop doing 2 conversions on the future.

I'm ok with that as the number of formats is low. If that changes in the future, we maybe need to walk back or only partially implement 1-to-1 conversions for all formats.

pmeier

Thanks!

pmeier · 2022-11-10T15:50:58Z

torchvision/prototype/transforms/functional/_color.py

+    h6 = h.mul(6)
    i = torch.floor(h6)
-    f = h6 - i
+    f = h6.sub_(i)


As discussed offline, we only get benefits if we can eliminate a tensor division, which is not the case here.

pmeier · 2022-11-10T15:51:21Z

torchvision/prototype/transforms/functional/_type_conversion.py

@@ -20,6 +19,8 @@ def decode_image_with_pil(encoded_image: torch.Tensor) -> features.Image:

 @torch.jit.unused
 def decode_video_with_av(encoded_video: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, Dict[str, Any]]:
+    import unittest.mock


Will send a PR soon.

pmeier · 2022-11-10T15:53:01Z

torchvision/prototype/transforms/functional/_meta.py

@@ -164,6 +164,7 @@ def convert_format_bounding_box(
    if new_format == old_format:
        return bounding_box

+    # TODO: Add _xywh_to_cxcywh and _cxcywh_to_xywh to improve performance


I'm ok with that as the number of formats is low. If that changes in the future, we maybe need to walk back or only partially implement 1-to-1 conversions for all formats.

Summary: * Performance optimization on adjust_hue_image_tensor * handle ints * Inplace logical ops * Remove unnecessary casting. * Fix linter. Reviewed By: NicolasHug Differential Revision: D41265196 fbshipit-source-id: f761c1238f42eb1771de520dcea88b74d016f3d2

datumbox added 3 commits November 9, 2022 17:30

Performance optimization on adjust_hue_image_tensor

f2e4a0b

handle ints

76218ac

Inplace logical ops

78230f4

datumbox added module: transforms Perf For performance improvements prototype labels Nov 9, 2022

datumbox requested review from vfdev-5 and pmeier November 9, 2022 18:18

facebook-github-bot added the cla signed label Nov 9, 2022

Merge branch 'main' into perf/hue

95f2cfd

datumbox mentioned this pull request Nov 9, 2022

Performance improvements for transforms v2 vs. v1 #6818

Closed

31 tasks

datumbox added 2 commits November 9, 2022 18:26

Remove unnecessary casting.

a99db93

Fix linter.

6d9cea1

datumbox commented Nov 10, 2022

View reviewed changes

pmeier approved these changes Nov 10, 2022

View reviewed changes

Merge branch 'main' into perf/hue

de55c0b

datumbox merged commit d72e906 into pytorch:main Nov 10, 2022

datumbox deleted the perf/hue branch November 10, 2022 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[prototype] Speed up `adjust_hue_image_tensor` #6938

[prototype] Speed up `adjust_hue_image_tensor` #6938

Uh oh!

datumbox commented Nov 9, 2022 •

edited

Loading

Uh oh!

datumbox left a comment •

edited

Loading

Uh oh!

datumbox Nov 10, 2022

Uh oh!

datumbox Nov 10, 2022

Uh oh!

datumbox Nov 10, 2022

Uh oh!

datumbox Nov 10, 2022

Uh oh!

pmeier Nov 10, 2022

Uh oh!

datumbox Nov 10, 2022

Uh oh!

pmeier Nov 10, 2022

Uh oh!

datumbox Nov 10, 2022

Uh oh!

pmeier Nov 10, 2022

Uh oh!

pmeier left a comment

Uh oh!

pmeier Nov 10, 2022

Uh oh!

pmeier Nov 10, 2022

Uh oh!

pmeier Nov 10, 2022

Uh oh!

Uh oh!

[prototype] Speed up adjust_hue_image_tensor #6938

[prototype] Speed up adjust_hue_image_tensor #6938

Uh oh!

Conversation

datumbox commented Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datumbox left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[prototype] Speed up `adjust_hue_image_tensor` #6938

[prototype] Speed up `adjust_hue_image_tensor` #6938

datumbox commented Nov 9, 2022 •

edited

Loading

datumbox left a comment •

edited

Loading