-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[FEEDBACK] Multi-weight support prototype API #5088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I was looking at the example in the article and the one thing that jumped out at me as a potential source of confusion (especially for newcomers) was resnet = PM.ResNet50.ImageNet1K_V1
weights = resnet.weights
preprocess = resnet.transforms()
class_ids = resnet.meta["categories"] And the example would turn into from torchvision.prototype import models as PM
# Step 1: Initialize model
resnet = PM.ResNet50.ImageNet1K_V1 #Less verbose than ResNet50_Weights also stops the ResNet50_weights confusion for those not using autocomplete IDEs
model = PM.resnet50(weights=resnet.weights) # or maybe just resnet
model.eval()
# Step 2: Initialize the inference transforms
preprocess = resnet.transforms()
# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
prediction = model(batch).squeeze(0).softmax(0)
# Step 4: Use the model and print the predicted category
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = resnet.meta["categories"][class_id] # I like meta attributes
print(f"{category_name}: {100 * score}*%*") Of course all this is a matter of style/preference, and there might be reasons it was implemented that way. I haven't looked at the code. Overall I really like this feature, it will surely make life easier ❤️ no more copy pasting transforms, hurray! 🥳 |
@cceyda Thanks for taking the time to provide detailed feedback. 😄 Originally we considered going with a proposal similar to what you suggested as it would lead to less verbose names. Unfortunately the name Concerning linking the transforms to the model VS the weights abstraction, one reason we selected the latter is to allow for extra flexibility. The same model variant can be used in different training recipes and produce weights that require different preprocessing transforms. A good real-world example of that in TorchVision is EfficientNet B1, which requires different preprocessing for v1 and v2. For full transparency, the points you raised are valid and many of them were brought up during development (see discussions at #4937 and #4652). Unfortunately due to the constraints imposed there is no single perfect solution that leads to a concise, consistent, pythonic and Backwards Compatible API. Multiple good solutions exist and it depends on what one optimizes for. Let me know your thoughts. |
If multiple weights are available per model, it'd be nice if we could programmatically scan through all available weights and automatically associate them with their corresponding model function (ie: lowercase "resnet50"). On top of that, each model function should clearly document to the user what weights are available:
|
@divideconcept Thanks a lot for your feedback. Your recommendations are indeed spot on. It's very reassuring for us that you identify the same limitations as this is something we want to tackle in our 2022Q1 roadmap. More specifically:
Let's stay in touch concerning the prototype work. Given your work on TorchStudio, it would be great to coordinate and get your input early. |
I think prototype API is wonderful design. Now that you add the data transform of |
@xiaohu2015 Thanks for the feedback! You are right, currently we don't include the training transforms but instead provide a link to the training recipe. There was originally a lot of debate on whether we should include them and this is still an open question. For full transparency, I will list here the reasons why currently they are not included, so that the community is aware and can provide feedback:
For now we opted in providing a reference to the training recipe, but I would love to hear your input on how we can boost reproducibility further. |
@datumbox I found some new class names in new API contain I am also wondering why it has to be Enum. It seems that all the new WeightEnum classes come with some URL, so I feel HuggingFace hub style (input: URL, output: module e.g., transform, model weights, model config, etc) may be convenient for many PyTorch users. |
@yoshitomo-matsubara Thanks for the feedback.
You are right. We are not happy with the current naming situation. This is why I was eager to discuss this here on Github, summarize the challenges and reasons we adopted the specific naming convention and, if possible, find alternatives. The issue is that TorchVision has already extremely long model builder names that contain
After lengthy offline and online discussions (see #4937 (comment), #4652 (comment)), the convention adopted was to take the Model builder name, capitalize it properly and append
If you have any suggestions on how to improve the situation, I would really love to hear them!
First of all it is possible to pass strings to the model builder. So the following will work, though it's discouraged: from torchvision.prototype import models as PM
model = PM.resnet50(weights="ResNet50_Weights.ImageNet1K_V2")
# or
model = PM.resnet50(weights="ImageNet1K_V2") Why? Because now you don't have access to the weights transforms. So instead, if one wants to work with strings is advised to do: from torchvision.prototype import models as PM
w = PM.get_weight("ResNet50_Weights.ImageNet1K_V2")
PM.resnet50(weights=w)
preprocess = w.transforms()
# ... Enums allowed us to group together information related to the model weights in a single object and ensure its always available for us during the model construction. This allowed us to enforce schemas on the meta-data and retrieve key information (such as the backend and the non-quantized weights for the case of quantization, the
For backwards compatibility reasons, the output of model builders must be the model, not a tuple of things. Moreover there are other limitations such as JIT-scriptability that we had to factor in. The above API choice does not stop us from offering a registration mechanism that gives a similar functionality on the future. It was actually within our plans to offer such a new API but decided to decouple it from this proposal to give more time for feedback. You can see an example of such a registration mechanism on the prototype dataset. Let me know your thoughts. |
Thank you @datumbox for the clarification. In terms of grouping information by enums,
e.g., Grouping weights of FasterRCNN series like this
instead of
What do you think? |
@yoshitomo-matsubara Thanks for the proposal and for including an example to clarify what you mean.
Yes that is correct, it comes from this existing convention of TorchVision. Using Hierarchical structures is a viable solution with the benefit that it groups together weights that are on the same family. I'm not sure it addresses the length aspect as
Correct, the values of the enums don't have to be capitalized like this. Here are a few current examples of such values: @NicolasHug @pmeier Any thoughts on changing the capitalization of the values of Weight Enums to all capitals? |
Given that PEP8 states:
and enum fields are very much constants, I would prefer to also use this naming scheme for them. |
I agree that it the enum fields like
I think the main problem in the current naming is
I might miss something there, but will the change look simple like below? @handle_legacy_interface(weights=("pretrained", ResNetWeights.RESNET18.IMAGENET1K_V1))
def resnet18(*, weights: Optional[ResNetWeights.RESNET18] = None, progress: bool = True, **kwargs: Any) -> ResNet:
weights = ResNetWeights.RESNET18.verify(weights)
return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)
@handle_legacy_interface(weights=("pretrained", ResNetWeights.RESNET34.IMAGENET1K_V1))
def resnet34(*, weights: Optional[ResNetWeights.RESNET34] = None, progress: bool = True, **kwargs: Any) -> ResNet:
weights = ResNetWeights.RESNET34.verify(weights)
return _resnet(BasicBlock, [3, 4, 6, 3], weights, progress, **kwargs) instead of @handle_legacy_interface(weights=("pretrained", ResNet18_Weights.ImageNet1K_V1))
def resnet18(*, weights: Optional[ResNet18_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
weights = ResNet18_Weights.verify(weights)
return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)
@handle_legacy_interface(weights=("pretrained", ResNet34_Weights.ImageNet1K_V1))
def resnet34(*, weights: Optional[ResNet34_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
weights = ResNet34_Weights.verify(weights)
return _resnet(BasicBlock, [3, 4, 6, 3], weights, progress, **kwargs) |
The current implementation uses the class information to verify that the provided Frankly I'm still not convinced that
Sounds good. Happy to review a PR that makes the capitalization change of enum values across the prototype API. |
Does it mean these two functions ( weights = ResNetWeights.RESNET18.verify(weights) https://github.com/pytorch/vision/blob/main/torchvision/prototype/models/_api.py#L50-L66 If the hierarchical structure is not something torchvision wants to offer, I came up with a new idea. Instead of having the hierarchical structure, what about introducing a weights module like Then, class MobileNetV3LargeFpnWeights(WeightsEnum)
COCO_V1 = Weights(
url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth",
transforms=CocoEval,
meta={
**_COMMON_META,
"num_params": 19386354,
"recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-fpn",
"map": 32.8,
},
)
default = COCO_V1 While the class name itself has no link to It may also shorten a python file per model compared to having the weight enum classes and model classes + functions in one file.
Regardless of whether or not my ideas above are adopted, I'm happy to make the changes and send a PR :D P.S. As @classmethod
def from_str(cls, value: str) -> "WeightsEnum":
for k, v in cls.__members__.items():
if k == value:
return v
raise ValueError(f"Invalid value {value} for enum {cls.__name__}.") with @classmethod
def from_str(cls, value: str) -> "WeightsEnum":
if value in cls.__members__:
return cls.__members__[value]
raise ValueError(f"Invalid value {value} for enum {cls.__name__}.") I can send a separate PR for this if it looks good to you. |
@yoshitomo-matsubara Thanks for the ideas. I think for now we are can move forwards with the capitalization of the enum values. Good spot on the optimization of the Feel free to add me as reviewer on the PR so that we can discuss any potential name changes. |
We've merged the prototype API into main TorchVision. The target was to do this as early as possible before the next release to leave enough time for tests. Please keep sharing your feedback to help us iron out the rough edges. Issues/PRs are very welcome. |
Hi, This feature is very nice. But how to know training information through prototype, such as epoch, aug, etc. I think this information is directly tied to mAP. class RetinaNet_ResNet50_FPN_Weights(WeightsEnum):
COCO_V1 = Weights(
url="https://download.pytorch.org/models/retinanet_resnet50_fpn_coco-eeacb38b.pth",
transforms=ObjectDetection,
meta={
"task": "image_object_detection",
"architecture": "RetinaNet",
"publication_year": 2017,
"num_params": 34014999,
"categories": _COCO_CATEGORIES,
"interpolation": InterpolationMode.BILINEAR,
"recipe": "https://github.com/pytorch/vision/tree/main/references/detection#retinanet",
"map": 36.4,
},
)
DEFAULT = COCO_V1 Is it possible to attach a link to indicate these key pieces of information? Currently through https://pytorch.org/vision/stable/models.html#runtime-characteristics, I still cannot know the training strategy. Was it trained by this training strategy ? I am very confused. |
It's right there in your code snippet: |
you can find the training setting (traing command) in the url of |
@hhaAndroid This is very fair feedback; we must fix our documentation. The exact training details should be already there (else it's a bug and we need to fix ASAP) but it's a bit of a mess right now because we literally just released the Multi-weight support API. I confirm that what @divideconcept and @xiaohu2015 told you is correct. The In Q2, as part of our documentation revamp, we plan to create cleaner model documentation. @NicolasHug is already looking into this and has created some prototypes at #5577. There is a dedicated feedback issue for our documentation #5511. |
@datumbox OK. Thank you for your patient reply. I'm confused because the mAP is 36.4 when retinanet is trained for 26 epochs. According to my understanding, 12 epoch can achieve this accuracy. So where am I not understanding? |
Yes. Really need a document like |
you are right, it should be get the same mAP use 1x (12 epoch) training. |
So why is the 26 epoch accuracy not improved? |
Agreed. Any thoughts on whether this should be on the documentation page for the specific model builder (for example
The retinanet model you refer to is very old and wasn't trained by any current member of our team. Unfortunately we don't have the detailed training log but we know it was trained for 26 epochs. You are right to say that possibly that was an overkill but that's how it was produced and that's why we record it as such. Having said that, our team has now put processes in place to ensure our work is fully reproducible and thorough checks are done prior merging weights. Finally, if you have more feedback about our documentation we would love to hear it at #5511. If you have feedback concerning this new model builders, use this thread. |
Hi @datumbox, I am updating my personal vision library to take into account the new API and I find that it is a bit inconvenient to work with, mostly because it requires users to manually specify the weights for different models. Suppose that for some application we always use pre-trained weights, then in the old API the user only needs to provide a single argument that is the model name: for m in ['resnet50', 'resnet101', 'resnet152']: # etc.
model = torchvision.models.__dict__[m](pretrained=True) With the new API, there is a second argument that depends on the first: model = models.__dict__['resnet50'](weights=ResNet50_Weights.DEFAULT)
model = models.__dict__['resnet101'](weights=ResNet101_Weights.DEFAULT) This is rather cumbersome (and a bit weird), because if my model is I would suggest the following: model = models.__dict__['resnet50'](weights='default') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='v1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='v2') # Use ResNet50_Weights.IMAGENET1K_V2
model = models.__dict__['resnet50'](weights=ResNet50_Weights.DEFAULT) # Use ResNet50_Weights.DEFAULT This way we would have the best of both worlds (the old and new APIs). |
@netw0rkf10w Thanks a lot for the feedback. If I understand you correctly, you would like to be able to initialize the weights using strings instead of providing the whole Enum. Is that right? If that's what you mean, this is already supported. Examples: # These all work:
model = models.__dict__['resnet50'](weights='DEFAULT') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='IMAGENET1K_V1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='IMAGENET1K_V2') # Use ResNet50_Weights.IMAGENET1K_V2
# Or the shorter version
model = resnet50(weights="DEFAULT")
model = resnet50(weights="IMAGENET1K_V1")
model = resnet50(weights="IMAGENET1K_V2") Currently the values must be all capital (match their enum values). Unfortunately we can't make them shorter (aka "v1", "v2") because our intention is to offer many more pre-trained weights on various datasets. In fact we've already added Weakly supervised weights trained on instagram data (see #5708) and we plan to increase the number of datasets on the future. Thus we've opted for adding the dataset name in the weight name. So since this supported why it's not well-documented? Two main reasons:
On the near future we also plan to build better registration mechanisms so that you don't have to do: # no more hacks like this:
model = models.__dict__['resnet50'](weights='DEFAULT')
# but instead something like this:
model = models.get('resnet50')(weights='DEFAULT') We didn't roll this out yet because we want to offer a similar mechanism as in the new Datasets API which is in development. If you have more comments/concerns or if I completely misunderstood what you meant, please let me know. Your feedback is very welcome. :) |
@datumbox Thanks a lot for the detailed reply!
But this is exactly what I was looking for!! I didn't know that this is already supported, there's no documentation other than your blog post, which didn't mention this feature (perhaps it was not yet implemented at that time).
Thanks. I'll join the discussion in #5833 later (though I feel that general/common documentation for the new weight API would be currently more needed than per-model documentation as proposed in #5833; what I mean by general documentation could be similar to a succinct summary of your blog post, but updated with all the new features).
This is indeed a nice-to-have feature. Looking forward to its release! Cheers! |
@datumbox Is there any plan to add the weights pre-trained (with self-supervision) on ImageNet-21K for |
Not immediate plans but it's something we can consider for future models. Up until recently that was not an option purely because we couldn't deploy multiple weights. Now that it's possible, we can assess it. I think that merits its own issue and would require some discussion for which models we should support. |
@datumbox Thanks. Would you be interested in a PR? ;) |
Potentially, but this requires discussion to understand exactly the proposal and plan. We have a new Model Contribution Guide, so adding weights to TorchVision is definitely a possibility. The devil is in the details though so I recommend you to start a new issue and tag us there to discuss the details. :) |
Sorry everyone, I have had some time off since feedback was requested on that topic. But I'm quite happy with the proposed design. As a user & a contributor, I'd argue :
Now to me, that leads to a few things:
To the best of my limited knowledge, those last two challenges haven't been ignored so it looks quite exciting to me 👍 |
This might be a minor gripe, but I would personally see model = resnet50(weights=resnet50.weigths.DEFAULT)
# Of course also
model = resnet50(weigths=resnet50.weigths["DEFAULT"]) as the most pleasant implementation for these enums. User would not require two different imports to create a model (the model creation and weights). Currently I am using backbone = "resnet50"
anchor_sizes=(
(64,),
(128,),
(256,),
(512,),
(812,),
),
aspect_ratios = ((0.5, 0.75, 1.0, 1.33, 2.0),) * len(anchor_sizes)
anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios)
# Here new weight system will feels cumbersome and causes a lot of imports
if backbone="resnet50":
weights = ResNet50_Weights.DEFAULT
elif backbone="resnetAny":
# ....
pass
backbone = resnet_fpn_backbone(
backbone_name=backbone,
weights=weights,
)
model = FasterRCNN(
backbone,
num_classes=num_classes,
rpn_anchor_generator=anchor_generator,
) Of course, after reading discussion I realized that Edit: The |
@vahvero Thanks for the feedback. I agree that the idiom Currently strings are a convenient shortcut which allow you to create a model with a single import. What I find a bit worrying is the fact that it took you a while to find out about the fact that they are supported. We mention it in the docs, but I wonder if it's not highlighted correctly. Thoughts? |
I second that, I wasn't aware that I could use shortcut strings at all. I think the intro about how to use weights is fine and should stay where it is, but it could give a hint to users to define weights as being Unless you think this complexifies the reading of the parameter too much ! |
@datumbox we can easily add a small sentence here which would say. "Weights can be specified as strings as well e.g. 'DEFAULT' or '<some relevant weight>'". Lines 327 to 330 in 91176e8
I'll send a PR tomorrow |
I routinely just read the source code, because most of the keyword arguments are left undocumented in torch docs. Our use cases always require some changes and I have found it easier to find the relevant information this way, in the case above, the FPN changes with different anchor sizes. I would not suggest basing the goodness of the documentation to my experience. That said, these keywords have been documented and typed in the source code as I would like to add to @divineconcepts answer that from 3.10 onward EDIT: If being really strict, they accept |
@vahvero @divideconcept Thanks for the input. The honest truth is that originally I was not too eager to offer this shortcut but I was compelled by the community feedback to make it available. Originally one key limitation of using strings was that you couldn't get the meta-data and preprocessing info associated with them. We later fixed this by providing the I think adding the string on the types makes sense. I spoke with @NicolasHug and we plan to provide a registration mechanism as well this half. We can make the change on the types while we add the registration. Thanks a lot for helping us make a more streamlined API. |
Hi @datumbox, I want to kindly bring your attention to an inconvenient use case in the adversarial ML community. A common use case there (as far as I know) is to compute the gradient wrt the input with two expectations:
For example, with the old API, I can do the following: # Get data sample of the desired shape
dataset = ImageNet(..., transform=resize_and_center_crop_and_to_tensor)
x, y = dataset[0]
# Get model including the normalization layer
model = nn.Sequential(
partial(F.normalize, mean=mean, std=std), # just a demo, should be a nn.Module working like this
resnet50(pretrained=True),
)
# Backward
x_t = torch.autogradVariable(x.detach().clone(), requires_grad=True)
loss(model(x_t), y).backward()
# Do something to x
x_adv = x + x_t.grad In the example above, I need two separate transforms: resize and normalize. However, the new API does not expose these details; it expects the user to resize & normalize the inputs before feeding them to the model. Hence, I have to do some hacks on the private API: # Get full transforms from weights Enum
preprocess = weights.transforms()
# Extract the resize part -- normalize with (0, 1) so it does not have any effects
resize = ImageClassification(
crop_size=preprocess.crop_size[0],
resize_size=preprocess.resize_size[0],
mean=(0, 0, 0),
std=(1, 1, 1),
interpolation=preprocess.interpolation,
)
# Get model including the normalization layer
model = nn.Sequential(
preprocess, # we will send in the same size, so it will only normalize the input
resnet50(weights=weights),
)
# Prepare the data
x = resize(x)
# Do something similar to the previous example
x_t = torch.autogradVariable(x.detach().clone(), requires_grad=True)
loss(model(x_t), y).backward()
x_adv = x + x_t.grad I hope I have been clear about my point. There might be simpler examples, but this is the best that I could have thought about. I believe there are other cases where people prefer to work with pre-resized images before normalization. Back to the example above, I guess what I am expecting is that # Get full transforms from weights Enum
preprocess = weights.transforms()
# Extract the transforms up to before the normalization
prepare = preprocess.input_preparation
# Get model including the normalization layer
model = nn.Sequential(
preprocess.input_normalization,
resnet50(weights=weights),
)
# Prepare the data
x = prepare(x)
# Work on the model that expects inputs of the same shape ranged within [0, 1]
model(x) It would be great to hear your thoughts, I am not sure if this use case can be made easy from torchvision's side or my side. Thanks! |
@Lodour thanks a lot for the feedback, I appreciate the time you took to report it. This is an interesting use-case. I would like to explore with you if the new API is the right solution for you or if we need a different solution. The new API aims to address the followings:
The above approach is probably too restrictive for power users that:
Now obviously your use-case is very valid and not properly supported by the API. What's unclear to me still is whether this API is the right option for you. I suspect that given you want to construct a significantly different protocol than what was used for Classification, you are better off to check the configuration of the bundled transforms and construct your own. This will give you the freedom to modify it to your problem without having to touch private code. That's obviously one approach. There are others including for us to decide to make those classes public and fully configurable. I just don't know if it is possible to produce a generic enough class that can address all use-cases while maintaining a clean and simple interface. This is a direction we considered initially but we decided that users who want full control should define their own preprocessing. I would love to hear your thoughts on this. Looking forward to your reply. |
@datumbox Thanks for your response! I agree with you that hacking into this private API is not the right option. Although I really like the new API because of the benefits you have listed; previously I had to figure out those numbers from the docs and hardcode them in every project, especially the mean and std for normalization. Then I guess this leads to where I am itching for -- I want to implement my own pipeline but also get some benefits from the new API. For example, I would prefer not to hardcode some parameters about the pre-trained model (e.g., mean and std) if they are already included in the new API (right now in the private API). Based on your restriction of the versioned transforms & private class, it occurs to me that the hyper-parameters for preprocessing might be suitable for the class ResNet50_Weights(WeightsEnum):
IMAGENET1K_V1 = Weights(
...,
meta={
...,
"transforms": {
"resize_size": 256,
"crop_size": 224,
"mean": ...,
"std": ...,
}
}
IMAGENET1K_V2 = Weights(
...,
meta={
...,
"transforms": {
# different parameters for V2
}
} I think having these parameters in the metadata could benefit other users with special use cases. Here it seems OK to rely on But this might duplicate the hardcoded parameters in metadata and the current invocation of Looking forward to your thoughts on this option! |
🚀 Feedback Request
This issue is dedicated for collecting community feedback on the Multi-weight support API. Please review the dedicated article where we describe the API in detail and provide an overview of its features.
We would love to get your thoughts, comments and input in order to finalize the API and include it on the new release of TorchVision.
cc @oke-aditya @frgfm @zhiqwang @xiaohu2015
The text was updated successfully, but these errors were encountered: