Skip to content

[proto] Small improvement for tensor equalize op #6738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 11, 2022

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Oct 11, 2022

Time benchmark: RandomEqualize (1.0,) None
V2: RandomEqualize(p=1.0) torchvision.prototype.transforms._color
Stable: RandomEqualize(p=1.0) torchvision.transforms.transforms

Main:

[- Classification transforms measurements -]
                         |  stable  |    v2
1 threads: ---------------------------------
      Tensor Image data  |  2.875   |  3.184

Times are in milliseconds (ms).

This PR:

[- Classification transforms measurements -]
                         |  stable  |    v2
1 threads: ---------------------------------
      Tensor Image data  |  2.883   |  2.874

Times are in milliseconds (ms).

Here is cprof logs to see number of calls reduction:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
-     6000    3.918    0.001    6.354    0.001 /vision/torchvision/transforms/functional_tensor.py:870(_scale_channel)
-     6000    1.216    0.000    1.216    0.000 {built-in method torch.bincount}
-    12000    0.957    0.000    0.957    0.000 {method 'to' of 'torch._C._TensorBase' objects}
-     4000    0.360    0.000    0.360    0.000 {built-in method torch.stack}
+     6000    3.433    0.001    5.380    0.001 /vision/torchvision/prototype/transforms/functional/_color.py:186(_scale_channel)
+     6000    1.229    0.000    1.229    0.000 {built-in method torch.bincount}
+    12000    0.454    0.000    0.454    0.000 {method 'to' of 'torch._C._TensorBase' objects}
+     2000    0.187    0.000    0.187    0.000 {built-in method torch.stack}

Main (12adc54):

   660002 function calls in 7.247 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6000    3.918    0.001    6.354    0.001 /vision/torchvision/transforms/functional_tensor.py:870(_scale_channel)
     6000    1.216    0.000    1.216    0.000 {built-in method torch.bincount}
    12000    0.957    0.000    0.957    0.000 {method 'to' of 'torch._C._TensorBase' objects}
     4000    0.360    0.000    0.360    0.000 {built-in method torch.stack}
    18000    0.080    0.000    0.080    0.000 {built-in method torch.div}
     2000    0.069    0.000    6.424    0.003 /vision/torchvision/transforms/functional_tensor.py:892(<listcomp>)
     6000    0.061    0.000    0.061    0.000 {built-in method torch.nn.functional.pad}
    10000    0.043    0.000    0.043    0.000 {method 'view' of 'torch._C._TensorBase' objects}
     2000    0.040    0.000    6.984    0.003 /vision/torchvision/prototype/transforms/_transform.py:66(forward)
     2000    0.038    0.000    0.141    0.000 /usr/lib/python3.8/traceback.py:321(extract)
    10000    0.037    0.000    0.037    0.000 {built-in method posix.stat}
     6000    0.036    0.000    0.036    0.000 {method 'clamp' of 'torch._C._TensorBase' objects}
     6000    0.034    0.000    0.034    0.000 {method 'sum' of 'torch._C._TensorBase' objects}
     6000    0.031    0.000    0.031    0.000 {built-in method torch.cumsum}
     2000    0.030    0.000    0.053    0.000 /usr/lib/python3.8/traceback.py:388(format)
     2000    0.029    0.000    6.642    0.003 /vision/torchvision/transforms/functional_tensor.py:891(_equalize_single_image)
     2000    0.023    0.000    0.023    0.000 {built-in method torch.rand}
     2000    0.017    0.000    6.880    0.003 /vision/torchvision/prototype/transforms/functional/_color.py:186(equalize_image_tensor)
    54000    0.014    0.000    0.033    0.000 /usr/lib/python3.8/traceback.py:285(line)
    36000    0.012    0.000    0.012    0.000 {method 'format' of 'str' objects}

This PR:

   652002 function calls in 6.011 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6000    3.433    0.001    5.380    0.001 /vision/torchvision/prototype/transforms/functional/_color.py:186(_scale_channel)
     6000    1.229    0.000    1.229    0.000 {built-in method torch.bincount}
    12000    0.454    0.000    0.454    0.000 {method 'to' of 'torch._C._TensorBase' objects}
     2000    0.187    0.000    0.187    0.000 {built-in method torch.stack}
    18000    0.085    0.000    0.085    0.000 {built-in method torch.div}
     6000    0.058    0.000    0.058    0.000 {built-in method torch.nn.functional.pad}
    10000    0.046    0.000    0.046    0.000 {method 'view' of 'torch._C._TensorBase' objects}
     2000    0.040    0.000    5.752    0.003 /vision/torchvision/prototype/transforms/_transform.py:66(forward)
     2000    0.038    0.000    0.138    0.000 /usr/lib/python3.8/traceback.py:321(extract)
     6000    0.036    0.000    0.036    0.000 {method 'sum' of 'torch._C._TensorBase' objects}
    10000    0.036    0.000    0.036    0.000 {built-in method posix.stat}
     6000    0.032    0.000    0.032    0.000 {built-in method torch.cumsum}
     2000    0.031    0.000    0.053    0.000 /usr/lib/python3.8/traceback.py:388(format)
     6000    0.028    0.000    0.028    0.000 {method 'clamp_' of 'torch._C._TensorBase' objects}
     2000    0.022    0.000    0.022    0.000 {built-in method torch.rand}
     2000    0.018    0.000    5.648    0.003 /vision/torchvision/prototype/transforms/functional/_color.py:209(equalize_image_tensor)
     2000    0.015    0.000    5.395    0.003 /vision/torchvision/prototype/transforms/functional/_color.py:223(<listcomp>)
    54000    0.013    0.000    0.033    0.000 /usr/lib/python3.8/traceback.py:285(line)
     2000    0.012    0.000    0.012    0.000 {method 'unbind' of 'torch._C._TensorBase' objects}
    36000    0.012    0.000    0.012    0.000 {method 'format' of 'str' objects}

@vfdev-5 vfdev-5 marked this pull request as ready for review October 11, 2022 09:59
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Feel free to merge on green CI.

Comment on lines +202 to +203
lut.clamp_(0, 255)
lut = lut.to(torch.uint8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining what we discussed offline in regards to why moving clamp and to here leads to a faster result?

@datumbox datumbox added module: transforms Perf For performance improvements prototype labels Oct 11, 2022
@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Oct 11, 2022

There can be more improvement if we vectorize histogram computation with scatter_add_ (cc @lezcano).
As Vasilis is suggesting to go iteratively, let's merge this PR first and put other improvements in follow-up PRs.

@vfdev-5 vfdev-5 merged commit 11a2eed into pytorch:main Oct 11, 2022
@vfdev-5 vfdev-5 deleted the proto-small-optim-equalize branch October 11, 2022 21:47
facebook-github-bot pushed a commit that referenced this pull request Oct 17, 2022
Summary:
* [proto] Small improvement for tensor equalize op

* Fix code formatting

* Added a comment on the ops

Reviewed By: NicolasHug

Differential Revision: D40427464

fbshipit-source-id: f40623c83cebe269717151ae52f1fe9af47a3bde
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants