Skip to content

Replace get_image_size/num_channels with get_dimensions #5487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 28, 2022

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Feb 26, 2022

We would like to switch to an image_size convention that stores dimensions in (h, w) format instead of (w, h). Unfortunately the old get_image_size() method stored using the latter format due to PIL. Just changing the order in the new API while maintaining the same method name is a bad choice as it will certainly lead to confusion. An alternative approach is to introduce new "low-level" get_dimensions() kernels and a get_image_dimensions() utility method that return all image dimensions in the format (c, h, w).

@facebook-github-bot
Copy link

facebook-github-bot commented Feb 26, 2022

💊 CI failures summary and remediations

As of commit f5ff12b (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@datumbox datumbox marked this pull request as draft February 26, 2022 11:06
Copy link
Contributor Author

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments to assist review:

@datumbox datumbox marked this pull request as ready for review February 26, 2022 12:21
@datumbox datumbox changed the title Replace get_image_size/num_channels with get_image_dims Replace get_image_size/num_channels with get_dimensions Feb 26, 2022
@pmeier
Copy link
Collaborator

pmeier commented Feb 28, 2022

In general I agree with the direction of this. One caveat though: what if we have a transformation that needs only the height and width and we only have a BoundingBox or the like to extract it from? In such a case we have no access to the number of channels while the the image size is just height, width = bounding_box.image_size. This was my reason to keep the functions separate, since get_image_size accepts a wider range of inputs, while get_image_num_channels only applies to images.

@datumbox
Copy link
Contributor Author

datumbox commented Feb 28, 2022

@pmeier Is this a problem? If the low-level kernel asks for only the information it needs aka image_size which is (h, w), then this probably shouldn't be an issue. Depending on what info the user has available, they can use the right method (the proposed for images or the attribute for bboxes etc). This is currently the situation for all kernels and this proposal doesn't modify this. Am I missing something?

@pmeier
Copy link
Collaborator

pmeier commented Feb 28, 2022

IIUC your proposal correctly, get_image_size and get_image_num_channels will be deprecated and later removed in favor of get_image_dimensions, correct? How would the implementation of a transformation look like that gets a sample dictionary that contains a bounding box and I need the image size from it?

@datumbox
Copy link
Contributor Author

You will continue doing .image_size from the bbox, exactly to what you do now. See:

elif isinstance(input, features.BoundingBox):
output = F.resize_bounding_box(input, self.size, image_size=input.image_size)
return features.BoundingBox.new_like(input, output, image_size=self.size)

@datumbox datumbox force-pushed the prototype/image_dims branch from cf62e58 to 8a22aa0 Compare February 28, 2022 09:29
@datumbox datumbox force-pushed the prototype/image_dims branch from 8a22aa0 to 253e543 Compare February 28, 2022 09:45
@pmeier
Copy link
Collaborator

pmeier commented Feb 28, 2022

That is only possible at the "dispatch" state, i.e. if we already look at a single item of the sample dictionary. However, sometimes we need the image_size at an earlier stage. For example,

def _get_params(self, sample: Any) -> Dict[str, Any]:
image = query_image(sample)
width, height = F.get_image_size(image)

We currently do not support bounding boxes in the crop transforms, but I don't think it is a stretch to do so in the future. If we do, something like RandomResizedCrop should also work in cases we don't pass an image and just a BoundingBox. But get_image_dimensions cannot be applied to bounding boxes, since they don't carry any channels information. get_image_size however can be used (if we add support for bounding boxes there).

@datumbox
Copy link
Contributor Author

something like RandomResizedCrop should also work in cases we don't pass an image and just a BoundingBox

I am not aware of any real world use-case where you Randomly Crop/resize a BBox without the Image. Doing so will destroy the parity between the image and the bbox and render the info useless.

@pmeier
Copy link
Collaborator

pmeier commented Feb 28, 2022

I am not aware of any real world use-case where you Randomly Crop/resize a BBox without the Image.

To quote yourself

Researchers do crazy things

Given that this doesn't solve anything right now, I would postpone this PR until we have a clearer picture. Or did I miss anything for which we need this change?

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to unblock.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the main area LGTM.

Unfortunately the old get_image_size() method stored using the latter format due to PIL. Just changing the order in the new API while maintaining the same method name is a bad choice as it will certainly lead to confusion

Fully agree with this. Pushing this further, would it make sense to not have image_size on the prototype area, and just have image_dimensions that always returns C, H, W (with C potentially being None if it's unknown)? BBoxes could just do _, h, w = bbox.image_dimension ?

@datumbox
Copy link
Contributor Author

@NicolasHug I think that's a perfectly viable proposal which will align fully the use of CHW everywhere across TorchVision. The only reason I see why we might not want to include the channels of the original image in BBox and Masks is because the low-level kernels don't need this info when they make operations on them. For example when you resize a bbox, you need the old and new width/height not the channels. But again, I personally value more consistency so I don't think that's a major deal breaker.

Another option would be to store height and width separately in the meta-data and stop passing image_size to methods. This will eliminate any confusion for what's what, though it will make things more verbose and will cause BC-breakages on APIs like resize().

@pmeier I'm positive that the use-case you describe is not something we need to support, as it's not a valid ML scenario. Concerning why we should move this forward, this PR affects your proposal at #5492. Note that instead of having multiple PRs with multiple proposals (something that is very hard to follow and share with the team for feedback), we should do iterations on top of the prototype area on main branch.

@datumbox datumbox merged commit 095437a into pytorch:main Feb 28, 2022
@datumbox datumbox deleted the prototype/image_dims branch February 28, 2022 13:16
@gastruc
Copy link

gastruc commented Mar 1, 2022

Hi, running this tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html, I get this error that may be linked with this merge:

File "/home/studio-lab-user/.conda/envs/default/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/studio-lab-user/.conda/envs/default/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/studio-lab-user/.conda/envs/default/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/tmp/ipykernel_46/1346308532.py", line 47, in __getitem__
    img, target = self.transforms(img, target)
  File "/home/studio-lab-user/sagemaker-studiolab-notebooks/transforms.py", line 26, in __call__
    image, target = t(image, target)
  File "/home/studio-lab-user/.conda/envs/default/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/studio-lab-user/sagemaker-studiolab-notebooks/transforms.py", line 37, in forward
    _, _, width = F.get_dimensions(image)
AttributeError: module 'torchvision.transforms.functional' has no attribute  'get_dimensions'
 

How to fix this?

@datumbox
Copy link
Contributor Author

datumbox commented Mar 1, 2022

@gastruc Please open an issue where you provide a minimal example that reproduces the problem.

From what I understand you are using get_dimensions() in your custom transforms.py file but you are installing a TorchVision version that doesn't support it. This PR was merged 22 hours ago, so it's likely that the change might not be available yet even on the nightly.

@prabhat00155
Copy link
Contributor

prabhat00155 commented Mar 3, 2022

This PR has introduced test_transforms.py::TestAccImage::test_accimage_crop test failure, which was not caught by CI as accimage is not installed in CI.

facebook-github-bot pushed a commit that referenced this pull request Mar 4, 2022
Summary:
* Replace get_image_size/num_channels with get_image_dims

* Reduce verbosity

* Fix JIT-scriptability

* Refactoring

* More refactoring

* Replace all _FP/_FT direct calls.

* Remove usages of get_image_size and get_image_num_channels from code-base.

* Fix JIT issues

* Adding missing assertion.

Reviewed By: NicolasHug

Differential Revision: D34579514

fbshipit-source-id: 851038b155279541836f2f3228a19f1d0239af57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants