Skip to content

Allow torchvision.io to pass through ToTensor() #2959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jgbradley1 opened this issue Nov 4, 2020 · 9 comments
Closed

Allow torchvision.io to pass through ToTensor() #2959

jgbradley1 opened this issue Nov 4, 2020 · 9 comments

Comments

@jgbradley1
Copy link
Contributor

jgbradley1 commented Nov 4, 2020

🚀 Ensure torchvision.io is a drop-in replacement with current workflows

The following snippet will fail.

img = torchvision.io.read_image()
img = torchvision.transforms.ToTensor()(img)

Pitch

Consider making native io compatible with existing transform workflows by allowing the tensor type to pass through ToTensor(). This would still scale down tensor values to the range 0-1 and not impact downstream transformations.

@jgbradley1 jgbradley1 changed the title Make ToTensor() compatible with torchvision.io Allow torchvision.io to pass through ToTensor() Nov 4, 2020
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 4, 2020

@jgbradley1 img = torchvision.io.read_image() gives a tensor, right ? Why would you like to transform tensor to tensor ?
If you would like to scale values, please use ConvertImageDtype. What do you think ?

@jgbradley1
Copy link
Contributor Author

jgbradley1 commented Nov 4, 2020

ConvertImageDtype would work. The actual issue that I am referring to in this post is a conflict with the design choice and the impact it will have going forward. The first question I should ask, for clarification, is whether or not you all want torchvision.io to be treated as a drop-in replacement for PIL image loading?

A long time ago, someone decided that ToTensor would convert PIL to tensor AND scale values down to the 0-1 range (despite the function name not being explicit about it) . For better or worse, the standard practice by many in the community is now to regularly place ToTensor in a composition of transforms and assume the scaling is being performed.

So do you let ToTensor accept tensors so people aren't forced to update multiple parts of their DL pipelines when switching to torchvision.io...or cause a breaking change in the design and force people to add a new transform all because they switched to torchvision.io?

Since most of the transforms accept PIL or tensor type, it makes sense to me under the current design to allow tensors to pass through ToTensor.

@jgbradley1
Copy link
Contributor Author

Perhaps I'm just way off-base with this line of thinking. ToTensor is documented as a conversion transform so I might be over-stating how often people depend on ToTensor for value scaling purposes.

I'm okay if the decision is

people should read the documentation and make necessary changes to use torchvision.io correctly

@fmassa
Copy link
Member

fmassa commented Nov 4, 2020

Hi,

Thanks for sparkling this discussion.

Even though the current default is to rely on ToTensor, it has multiple issues with it due to its implicit conversion to 0-1 range.
This is even more pronounced when using numpy arrays internally, and has led to a number of issues, see #546 and #1595 for some examples.
I believe the core issue is that ToTensor tries to do too many things at once, and trying to fix all corner cases will brings us to a rabbit hole.

My current thinking is to move users away from ToTensor, and instead ask them to use more self-contained functionality such as ConverImageDtype and PILToTensor if they want to use torchvision.io.
But we won't be removing ToTensor because it's such a widely used functionality, although we would like to encourage a different usage pattern.

If we were to change ToTensor to perform this implicit scaling (for convenience to users), I would prefer if we raised a warning in this use-case, with a message so that this behavior is not encouraged.
This way users could give this a try and see how it fits with their pipeline, but having warnings to remember them that this is not what we would encourage them to use.

Thoughts?

@jgbradley1
Copy link
Contributor Author

I like the suggestion to add a warning. While this request is not really a technical bug, it would help the community evolve and push everyone in the right direction away from using ToTensor. This will only happen as people stop using a PIL/numpy image loader and torchvision.io matures as the defacto standard.

Just summarizing what this request involves then. ToTensor shall:

  1. accept the output of torchvision.io.read_image, a uint8 tensor torch.ByteTensor in the range [0, 255]
  2. convert to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
  3. warn and encourage users to switch to the more self-contained functionality of ConvertImageDtype for processing image conversion and loading operations.

@fmassa
Copy link
Member

fmassa commented Nov 18, 2020

@jgbradley1 I think I would be fine with what you propose, but let's hear what @vfdev-5 and @datumbox think as well.

@datumbox
Copy link
Contributor

I'm on the fence on this one. On one hand, the proposal looks good and seeks to extend an existing functionality that could be useful to the users. On the other hand, implementing the improvement might reinforce using this problematic function. Since we are actively talking about moving users off the specific method, is it worth considering just adding a warning on the existing code and avoid adding any further extensions?

My concern stems from the fact that standardization/preprocessing is an important part of training/inference that should be handled explicitly by the user. This is particularly true in transfer learning where if you mess this up you can get really poor results. I suspect that with the current setup, many users might be calling the ToTensor() before passing it to a pre-trained model without realizing it scales down the values.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 19, 2020

I also think passing uint8 tensor into ToTensor in order to convert it to float and rescale is somewhat unexpected from ToTensor method name.

@fmassa
Copy link
Member

fmassa commented Nov 20, 2020

Ok, I'm convinced with @datumbox and @vfdev-5 comments, so I think in order to get users to try this out we should instead replace examples / tutorials with the new code.

As such, I'll be closing this issue, but thanks again for the discussion and let us know if you have any more thoughts / comments.

@fmassa fmassa closed this as completed Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants