Vectorize equalize transformation #3334

NicolasHug · 2021-02-01T13:00:23Z

This PR vectorizes the histograms computation (over batches and over channels) in the equalize transformation.

NicolasHug · 2021-02-01T13:06:12Z

torchvision/transforms/functional_tensor.py

+    if img.ndim == 3:
+        return _remap_single_image(img, luts)
+    else:  # more than one image
+        imgs = img
+        return torch.stack([
+            _remap_single_image(img, luts) for (img, luts) in zip(imgs, luts)
+        ])


Ideally we would get rid of the stack() calls: at that point all we want to do is to remap img (...xCxHxW) according to the values in luts (...xCx256).

There's probably a vetorized one-liner to do that, maybe with gather(), but I couldn't get it to work. Any help welcome :)

You can get both the single image and the batch of image to work with the approach I mentioned just above. The implementation would look somewhat like

idx_batch = torch.arange(0, batch_size, device=imgs.device)[:, None, None, None] idx_channels = torch.arange(0, num_channels, device=imgs.device)[None, :, None, None] return luts[idx_batch, idx_channels, imgs]

fmassa

Thanks for the PR!

I've a few suggestions which I think would allow to vectorize the rest of the operations.

Additionally, it would be good to see if we have any runtime speedups (either on CPU or GPU), as the vectorized implementation uses a bit more memory, so if we could show speedups there it would be nice

fmassa · 2021-02-01T14:44:35Z

torchvision/transforms/functional_tensor.py

+    return torch.stack([
+        channel_values[channel_indices] for (channel_values, channel_indices) in zip(luts, img)
+    ]).to(torch.uint8)


You can replace the for loop and the stack with something like

idxs = torch.arange(0, img.shape[-3], device=img.device)[:, None, None] luts[idxs, img]

fmassa · 2021-02-01T14:46:55Z

torchvision/transforms/functional_tensor.py

+    if img.ndim == 3:
+        return _remap_single_image(img, luts)
+    else:  # more than one image
+        imgs = img
+        return torch.stack([
+            _remap_single_image(img, luts) for (img, luts) in zip(imgs, luts)
+        ])


You can get both the single image and the batch of image to work with the approach I mentioned just above. The implementation would look somewhat like

idx_batch = torch.arange(0, batch_size, device=imgs.device)[:, None, None, None] idx_channels = torch.arange(0, num_channels, device=imgs.device)[None, :, None, None] return luts[idx_batch, idx_channels, imgs]

NicolasHug · 2021-02-01T17:55:37Z

Thanks for the review @fmassa .

Interestingly, this PR does not seem to lead to any improvement (quite the opposite actually). Locally on my laptop I get the following for a 64x3x128x128 tensor:

This PR: 1.51 s ± 88.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Master:  42.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Do you think it's worth investigating further where the slowdown might come from, or should we just close this PR and the original issue?

codecov · 2021-02-01T18:22:54Z

Codecov Report

Merging #3334 (9f2fb98) into master (859a535) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3334      +/-   ##
==========================================
+ Coverage   73.90%   73.91%   +0.01%     
==========================================
  Files         104      104              
  Lines        9618     9622       +4     
  Branches     1544     1542       -2     
==========================================
+ Hits         7108     7112       +4     
  Misses       2028     2028              
  Partials      482      482

Impacted Files	Coverage Δ
torchvision/transforms/functional_tensor.py	`79.39% <100.00%> (+0.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 859a535...9f2fb98. Read the comment docs.

datumbox · 2021-02-01T18:46:36Z

torchvision/transforms/functional_tensor.py


-    return torch.stack([_equalize_single_image(x) for x in img])
+    luts = (torch.cumsum(hist, dim=-1) + (step // 2)) // step


I think there is a potential bug here. In the original implementation the step was a scalar so the if statement was enough to protect against division by zero. Here it's a C dimensional vector and if some of its values are zeroes it can lead to infs/nans.

fmassa · 2021-02-02T13:47:46Z

Do you think it's worth investigating further where the slowdown might come from, or should we just close this PR and the original issue?

I think the 20x slowdown on a CPU is a bit surprising. I would have expected that it could be a bit slower on CPU and faster on the GPU.

I would say to investigate this a bit further to see where the slowdown comes from -- is it from the histogram creation? If yes, then we could probably leave the histogram creation in a for loop and then perform the batch indexing as we currently have in this PR.

NicolasHug · 2021-02-02T18:06:40Z

The slowdown does seem to come from the histogram creation, which amounts for the vast majority of the computation time. As a result I'm not sure we can expect a significant improvement even if we just use batch indexing (we'd still need to stack the histograms anyway, and the execution time is dominated by the histogram computation).

Note that on GPU the difference is smaller, but master is still faster.
Histogram creation for a (32, 3, 64, 64) image:

This PR: 14.6 ms ± 616 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Master:  8.81 ms ± 87 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On top of that the solution in this PR is much more memory-consuming, so we can probably just close this PR?

fmassa · 2021-02-16T10:19:55Z

Thanks for investigating this @NicolasHug !

Vectorize equalize

87bf217

facebook-github-bot added the cla signed label Feb 1, 2021

Merge branch 'master' of github.com:pytorch/vision into equalize

046550b

NicolasHug commented Feb 1, 2021

View reviewed changes

datumbox requested a review from fmassa February 1, 2021 13:57

fmassa reviewed Feb 1, 2021

View reviewed changes

datumbox and others added 4 commits February 1, 2021 16:51

Merge branch 'master' into equalize

463ec77

vectorize indexing procedure as suggested

a228557

Merge branch 'equalize' of github.com:NicolasHug/vision into equalize

34af188

remove unused helper

9f2fb98

datumbox reviewed Feb 1, 2021

View reviewed changes

NicolasHug closed this Feb 12, 2021

NicolasHug mentioned this pull request Feb 12, 2021

Vectorize the equalize transformation #3173

Closed

NicolasHug mentioned this pull request Mar 3, 2021

Speed up equalize transform: use bincount instead of histc #3493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize equalize transformation #3334

Vectorize equalize transformation #3334

NicolasHug commented Feb 1, 2021

NicolasHug Feb 1, 2021

fmassa Feb 1, 2021

fmassa left a comment

fmassa Feb 1, 2021

fmassa Feb 1, 2021

NicolasHug commented Feb 1, 2021

codecov bot commented Feb 1, 2021 •

edited

Loading

datumbox Feb 1, 2021

fmassa commented Feb 2, 2021

NicolasHug commented Feb 2, 2021

fmassa commented Feb 16, 2021


		return torch.stack([_equalize_single_image(x) for x in img])
		luts = (torch.cumsum(hist, dim=-1) + (step // 2)) // step

Vectorize equalize transformation #3334

Vectorize equalize transformation #3334

Conversation

NicolasHug commented Feb 1, 2021

NicolasHug Feb 1, 2021

Choose a reason for hiding this comment

fmassa Feb 1, 2021

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

fmassa Feb 1, 2021

Choose a reason for hiding this comment

fmassa Feb 1, 2021

Choose a reason for hiding this comment

NicolasHug commented Feb 1, 2021

codecov bot commented Feb 1, 2021 • edited Loading

Codecov Report

datumbox Feb 1, 2021

Choose a reason for hiding this comment

fmassa commented Feb 2, 2021

NicolasHug commented Feb 2, 2021

fmassa commented Feb 16, 2021

codecov bot commented Feb 1, 2021 •

edited

Loading