-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Replace get_image_size/num_channels with get_dimensions #5487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit f5ff12b (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
feba98a
to
f7513a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments to assist review:
In general I agree with the direction of this. One caveat though: what if we have a transformation that needs only the height and width and we only have a |
@pmeier Is this a problem? If the low-level kernel asks for only the information it needs aka |
IIUC your proposal correctly, |
You will continue doing vision/torchvision/prototype/transforms/_geometry.py Lines 47 to 49 in 04a3eab
|
cf62e58
to
8a22aa0
Compare
8a22aa0
to
253e543
Compare
That is only possible at the "dispatch" state, i.e. if we already look at a single item of the sample dictionary. However, sometimes we need the vision/torchvision/prototype/transforms/_geometry.py Lines 110 to 112 in f40c8df
We currently do not support bounding boxes in the crop transforms, but I don't think it is a stretch to do so in the future. If we do, something like |
I am not aware of any real world use-case where you Randomly Crop/resize a BBox without the Image. Doing so will destroy the parity between the image and the bbox and render the info useless. |
To quote yourself
Given that this doesn't solve anything right now, I would postpone this PR until we have a clearer picture. Or did I miss anything for which we need this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamping to unblock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the main area LGTM.
Unfortunately the old get_image_size() method stored using the latter format due to PIL. Just changing the order in the new API while maintaining the same method name is a bad choice as it will certainly lead to confusion
Fully agree with this. Pushing this further, would it make sense to not have image_size
on the prototype area, and just have image_dimensions
that always returns C, H, W (with C potentially being None if it's unknown)? BBoxes could just do _, h, w = bbox.image_dimension
?
@NicolasHug I think that's a perfectly viable proposal which will align fully the use of CHW everywhere across TorchVision. The only reason I see why we might not want to include the channels of the original image in BBox and Masks is because the low-level kernels don't need this info when they make operations on them. For example when you resize a bbox, you need the old and new width/height not the channels. But again, I personally value more consistency so I don't think that's a major deal breaker. Another option would be to store height and width separately in the meta-data and stop passing @pmeier I'm positive that the use-case you describe is not something we need to support, as it's not a valid ML scenario. Concerning why we should move this forward, this PR affects your proposal at #5492. Note that instead of having multiple PRs with multiple proposals (something that is very hard to follow and share with the team for feedback), we should do iterations on top of the prototype area on main branch. |
Hi, running this tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html, I get this error that may be linked with this merge:
How to fix this? |
@gastruc Please open an issue where you provide a minimal example that reproduces the problem. From what I understand you are using |
This PR has introduced |
Summary: * Replace get_image_size/num_channels with get_image_dims * Reduce verbosity * Fix JIT-scriptability * Refactoring * More refactoring * Replace all _FP/_FT direct calls. * Remove usages of get_image_size and get_image_num_channels from code-base. * Fix JIT issues * Adding missing assertion. Reviewed By: NicolasHug Differential Revision: D34579514 fbshipit-source-id: 851038b155279541836f2f3228a19f1d0239af57
We would like to switch to an
image_size
convention that stores dimensions in(h, w)
format instead of(w, h)
. Unfortunately the oldget_image_size()
method stored using the latter format due to PIL. Just changing the order in the new API while maintaining the same method name is a bad choice as it will certainly lead to confusion. An alternative approach is to introduce new "low-level"get_dimensions()
kernels and aget_image_dimensions()
utility method that return all image dimensions in the format(c, h, w)
.