[proposal] Use self.flatten instead of torch.flatten and when becomes possible derive ResNet from nn.Sequential (scripting+quantization is blocker), would simplify model surgery in the most frequent cases #3331

vadimkantorov · 2021-01-31T09:50:39Z

Currently In https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L243:

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

If it instead used x = self.flatten(x), then it would simplify model surgery: del model.avgpool, model.flatten, model.fc. Also in this case the class can just derive from Sequential and use OrderedDict to pass submodules (like in https://discuss.pytorch.org/t/ux-mix-of-nn-sequential-and-nn-moduledict/104724/2?u=vadimkantorov), this would preserve checkpoint compat as well. The method forward could then be removed

The text was updated successfully, but these errors were encountered:

datumbox · 2021-02-01T10:42:21Z

Thanks for the proposal. Replacing torch.flatten with nn.Flatten seems it won't break anything but could you describe a bit more your use-case and how it will help you achieve what you want?

Concerning doing a major refactoring and inheriting from Sequential, this might have side-effects on all the other models that depend on resnet (segmentation, object detection etc), so I'm not sure if it can be done in a backward-compatible manner. Note also that replacing the forward is not something we can do due to how Quantization works:

vision/torchvision/models/quantization/resnet.py

Lines 93 to 100 in 6116812

    
           def forward(self, x): 
        
               x = self.quant(x) 
        
               # Ensure scriptability 
        
               # super(QuantizableResNet,self).forward(x) 
        
               # is not scriptable 
        
               x = self._forward_impl(x) 
        
               x = self.dequant(x) 
        
               return x

@fmassa Let me know if you see any problems about replacing flatten.

fmassa · 2021-02-01T11:38:41Z

I would be fine replacing torch.flatten with nn.Flatten, although it would only simplify model surgery if we were to make further modifications to the ResNet model as you suggested (making it inherit from nn.Sequential). Without this, only del model.fc etc wouldn't be enough.

So from this perspective, there is limited value in replacing torch.flatten with nn.Flatten, it being only syntactic sugar from one another.

vadimkantorov · 2021-02-01T12:05:39Z

Oh, I didn't realize:

# Ensure scriptability 
# super(QuantizableResNet,self).forward(x) 
# is not scriptable

If this becomes fixed, I guess inheriting from nn.Sequential should be fine.

My real usecase: I'm working with a codebase https://github.com/ajabri/videowalk/blob/master/code/resnet.py#L41 that had to reimplement resnet forward because it wants to do simple model surgery

fmassa · 2021-02-02T13:56:00Z

@vadimkantorov model surgery in PyTorch generally requires re-writing forward or other parts of the model, except for nn.Sequential, so I would say it's a valid requirement to ask the users to be a bit more verbose

vadimkantorov · 2021-02-02T14:07:36Z

model surgery in PyTorch generally requires re-writing forward or other parts of the model, except for nn.Sequential

Of course. I understand that in general it's the case, but if it can be made simpler, I think it should be made simpler even if in general case full re-writing is required. In this particular case, it seems that because of quantization scripting limitations deriving from nn.Sequential is a no-go, but when that's solved it could be nice to derive ResNet from nn.Sequential.

I'll rename thie issue accordingly

vadimkantorov · 2021-02-10T07:55:38Z

It seems that deriving from nn.Sequential would also be good from DeepSpeed-compat standpoint: pytorch/pytorch#51574

For quantization case, couldn't it just insert quant/dequant-calling modules in the beginning and the end of sequential? then everything should work and no modification of forward would be needed

vadimkantorov · 2021-02-10T07:57:46Z

They are already modules:

        self.quant = torch.quantization.QuantStub()
        self.dequant = torch.quantization.DeQuantStub()

One would just be required to modify the constructor and insert them before conv1 and after layer4

vadimkantorov · 2021-06-06T21:36:50Z

@fmassa Even if we just replace torch.flatten by self.flatten, it already simplifies model surgery by model.avgpool = model.flatten = model.fc = nn.Identity() (instead of del model.avgpool, model.flatten, model.fc)

fmassa · 2021-06-07T11:40:40Z

@vadimkantorov I think we should revisit how one does model surgery in PyTorch. In #3597 I have a prototype feature (which I'll be updating soon) that has as a much more robust and generic way of providing feature extraction

As it relies on FX, it and quantization is also relying on FX for extracting the graph, I think this could be a good compromise which would address all your points I believe

vadimkantorov · 2021-06-07T11:54:04Z

I think deleting some upper layers is only tangentially related to intermediate feature extraction, as with model.avgpool = model.flatten = model.fc = nn.Identity() is a very simple and understandable way removing some layers (and not involving a recent new technology).

I propose to still do self.flatten, even if you merge your new FX-based feature

fmassa · 2021-06-07T11:58:09Z

@vadimkantorov the magic with the FX-based approach is that if you specify any part of the model, you can actually delete (by removing the compute and the parameters) of the end of the model. So it is actually a generalization of what you are proposing

vadimkantorov · 2021-06-07T11:59:43Z

I understand that it also solves this case :) But self.flatten is so simple and doesn't break any back-compat and doesn't require learning the new compiling functionality just for this simple goal :)

I think both are worthwhile :) (And maybe even figuring out how to derive from nn.Sequential for DeepSpeed benefits as well, but that's separate)

And probably the FX would still forward through the "layers-to-be-removed", and the large fully-connected layer can be quite costly

fmassa · 2021-09-08T08:46:45Z

@vadimkantorov we merged #4302, which should provide a generic way of performing model surgery.

Could you give this a try and provide us feedback if it is enough for your use-cases?

vadimkantorov · 2021-09-08T08:47:48Z

Very simple: remove average pooling, remove the last fc layer. I feel that having to plug generic FX / rewriting for this is an overkill

fmassa · 2021-09-08T08:48:52Z

@vadimkantorov the thing you just mentioned can be done in one line

model_surgery = create_feature_extractor(model, ['layer4'])

or more generically

nodes, _ = get_graph_node_names(model)
model_surgery = create_feature_extractor(model, nodes[-3])

vadimkantorov · 2021-09-08T08:53:38Z

I don't even need the fc layer to run. Will it still run with it? Or will it do DCE?

I mean, I understand that FX is also a way to achieve this, but just being able to del backbone.fc or backbone.fc = nn.Identity() is so much simpler way to achieve this for simple models.

fmassa · 2021-09-08T08:54:29Z

The FC layer (and its parameters) will be DCEd from the graph and won't be executed, so this is taken care for you.

vadimkantorov · 2021-09-08T08:55:55Z

I mean it's great to have a generic solution, but having an equivalent much simpler way (when possible) that users already know how to use is also a win

Then it also probably will be less debuggable. E.g. can I put a breakpoint in transformed code?

fmassa · 2021-09-08T08:59:48Z

The transformed code can be printed etc, and is executed as standard Python code by the Python interpreter, so you can jump to every line of it in a Python debugger.

For example

m = torchvision.models.resnet18()
mm = torchvision.models.feature_extraction.create_feature_extractor(m, ['layer2'])
print(mm.code)

will give you

def forward(self, x : torch.Tensor):
    conv1 = self.conv1(x);  x = None
    bn1 = self.bn1(conv1);  conv1 = None
    relu = self.relu(bn1);  bn1 = None
    maxpool = self.maxpool(relu);  relu = None
    layer1_0_conv1 = getattr(self.layer1, "0").conv1(maxpool)
    layer1_0_bn1 = getattr(self.layer1, "0").bn1(layer1_0_conv1);  layer1_0_conv1 = None
    layer1_0_relu = getattr(self.layer1, "0").relu(layer1_0_bn1);  layer1_0_bn1 = None
    layer1_0_conv2 = getattr(self.layer1, "0").conv2(layer1_0_relu);  layer1_0_relu = None
    layer1_0_bn2 = getattr(self.layer1, "0").bn2(layer1_0_conv2);  layer1_0_conv2 = None
    add = layer1_0_bn2 + maxpool;  layer1_0_bn2 = maxpool = None
    layer1_0_relu_1 = getattr(self.layer1, "0").relu(add);  add = None
    layer1_1_conv1 = getattr(self.layer1, "1").conv1(layer1_0_relu_1)
    layer1_1_bn1 = getattr(self.layer1, "1").bn1(layer1_1_conv1);  layer1_1_conv1 = None
    layer1_1_relu = getattr(self.layer1, "1").relu(layer1_1_bn1);  layer1_1_bn1 = None
    layer1_1_conv2 = getattr(self.layer1, "1").conv2(layer1_1_relu);  layer1_1_relu = None
    layer1_1_bn2 = getattr(self.layer1, "1").bn2(layer1_1_conv2);  layer1_1_conv2 = None
    add_1 = layer1_1_bn2 + layer1_0_relu_1;  layer1_1_bn2 = layer1_0_relu_1 = None
    layer1_1_relu_1 = getattr(self.layer1, "1").relu(add_1);  add_1 = None
    layer2_0_conv1 = getattr(self.layer2, "0").conv1(layer1_1_relu_1)
    layer2_0_bn1 = getattr(self.layer2, "0").bn1(layer2_0_conv1);  layer2_0_conv1 = None
    layer2_0_relu = getattr(self.layer2, "0").relu(layer2_0_bn1);  layer2_0_bn1 = None
    layer2_0_conv2 = getattr(self.layer2, "0").conv2(layer2_0_relu);  layer2_0_relu = None
    layer2_0_bn2 = getattr(self.layer2, "0").bn2(layer2_0_conv2);  layer2_0_conv2 = None
    layer2_0_downsample_0 = getattr(getattr(self.layer2, "0").downsample, "0")(layer1_1_relu_1);  layer1_1_relu_1 = None
    layer2_0_downsample_1 = getattr(getattr(self.layer2, "0").downsample, "1")(layer2_0_downsample_0);  layer2_0_downsample_0 = None
    add_2 = layer2_0_bn2 + layer2_0_downsample_1;  layer2_0_bn2 = layer2_0_downsample_1 = None
    layer2_0_relu_1 = getattr(self.layer2, "0").relu(add_2);  add_2 = None
    layer2_1_conv1 = getattr(self.layer2, "1").conv1(layer2_0_relu_1)
    layer2_1_bn1 = getattr(self.layer2, "1").bn1(layer2_1_conv1);  layer2_1_conv1 = None
    layer2_1_relu = getattr(self.layer2, "1").relu(layer2_1_bn1);  layer2_1_bn1 = None
    layer2_1_conv2 = getattr(self.layer2, "1").conv2(layer2_1_relu);  layer2_1_relu = None
    layer2_1_bn2 = getattr(self.layer2, "1").bn2(layer2_1_conv2);  layer2_1_conv2 = None
    add_3 = layer2_1_bn2 + layer2_0_relu_1;  layer2_1_bn2 = layer2_0_relu_1 = None
    layer2_1_relu_1 = getattr(self.layer2, "1").relu(add_3);  add_3 = None
    return {'layer2': layer2_1_relu_1}

which is what is executed by Python, and what the Python interpreter will execute

vadimkantorov · 2021-09-08T09:03:36Z

Well, it's great to have, but compared to just stepping into resnet.py, this is not the same :) jk

I agree this is a great, powerful feature allowing to use a lot backbones and it's good that it unblocks this issue, but I don't see reasons for not impoving the basic ResNet by moving flatten into an attribute and maybe transforming the quantized resnet into sequential

fmassa · 2021-09-08T09:11:56Z

I think the key idea here is that we are providing a single (generic?) solution that handle many more use-cases than what was possible before. Overriding modules with nn.Identity() was a hacky solution that didn't work most of the time (only inside nn.Sequential-style models), and can lead to confusion / silent bugs in many cases, specially to newer users I think.

Having the forward implemented for the models (even if it is a straight Sequential) can still be beneficial for users as it allows for the exact same things you have been advocating for in your past messages (namely seeing the execution code, more easily putting breakpoints / prints, etc).

Still, users are free to use what they think best fit their needs.

vadimkantorov · 2021-09-08T09:53:49Z

I was of course not talking of "newer users". Of course these hacks are for someone who already understands the backbone model source code and looks to avoid boiler-plate of copy-pasting the forward. This is no more "advanced" as the recommended individual model component reinitialization (e.g. model.roi_heads.box_predictor) as found in https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

In-the-wild surgery by resetting model.fc = nn.Identity(): https://lernapparat.de/resnet-how-many-models/

vadimkantorov · 2021-10-18T00:59:24Z

One more reason for models to be nn.Sequential whenever possible: https://pytorch.org/docs/stable/checkpoint.html?highlight=checkpoint_sequential#torch.utils.checkpoint.checkpoint_sequential

datumbox added module: models needs discussion labels Feb 1, 2021

vadimkantorov changed the title ~~[proposal] Use nn.Flatten in ResNet~~ [proposal] Derive ResNet from nn.Sequential when it becomes possible (scripting+quantization is blocker), would simplify model surgery in the simplest cases Feb 2, 2021

vadimkantorov mentioned this issue Feb 3, 2021

[Pipe] conversion to Sequential: various complex scenarios pytorch/pytorch#51574

Open

vadimkantorov mentioned this issue Feb 10, 2021

Insert at specific index/key in nn.Sequential pytorch/pytorch#43876

Open

fmassa mentioned this issue Jun 23, 2021

Replace torch.flatten with nn.Flatten in inception.py #4096

Open

vadimkantorov closed this as completed Sep 8, 2021

vadimkantorov reopened this Sep 8, 2021

vadimkantorov mentioned this issue Jan 18, 2022

Adding ConvNeXt architecture in prototype #5197

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposal] Use self.flatten instead of torch.flatten and when becomes possible derive ResNet from nn.Sequential (scripting+quantization is blocker), would simplify model surgery in the most frequent cases #3331

[proposal] Use self.flatten instead of torch.flatten and when becomes possible derive ResNet from nn.Sequential (scripting+quantization is blocker), would simplify model surgery in the most frequent cases #3331

vadimkantorov commented Jan 31, 2021 •

edited

Loading

datumbox commented Feb 1, 2021

fmassa commented Feb 1, 2021

vadimkantorov commented Feb 1, 2021 •

edited

Loading

fmassa commented Feb 2, 2021

vadimkantorov commented Feb 2, 2021

vadimkantorov commented Feb 10, 2021 •

edited

Loading

vadimkantorov commented Feb 10, 2021 •

edited

Loading

vadimkantorov commented Jun 6, 2021 •

edited

Loading

fmassa commented Jun 7, 2021 •

edited

Loading

vadimkantorov commented Jun 7, 2021

fmassa commented Jun 7, 2021

vadimkantorov commented Jun 7, 2021 •

edited

Loading

fmassa commented Sep 8, 2021

vadimkantorov commented Sep 8, 2021

fmassa commented Sep 8, 2021 •

edited

Loading

vadimkantorov commented Sep 8, 2021 •

edited

Loading

fmassa commented Sep 8, 2021

vadimkantorov commented Sep 8, 2021 •

edited

Loading

fmassa commented Sep 8, 2021 •

edited

Loading

vadimkantorov commented Sep 8, 2021 •

edited

Loading

fmassa commented Sep 8, 2021

vadimkantorov commented Sep 8, 2021 •

edited

Loading

vadimkantorov commented Oct 18, 2021

[proposal] Use self.flatten instead of torch.flatten and when becomes possible derive ResNet from nn.Sequential (scripting+quantization is blocker), would simplify model surgery in the most frequent cases #3331

[proposal] Use self.flatten instead of torch.flatten and when becomes possible derive ResNet from nn.Sequential (scripting+quantization is blocker), would simplify model surgery in the most frequent cases #3331

Comments

vadimkantorov commented Jan 31, 2021 • edited Loading

datumbox commented Feb 1, 2021

fmassa commented Feb 1, 2021

vadimkantorov commented Feb 1, 2021 • edited Loading

fmassa commented Feb 2, 2021

vadimkantorov commented Feb 2, 2021

vadimkantorov commented Feb 10, 2021 • edited Loading

vadimkantorov commented Feb 10, 2021 • edited Loading

vadimkantorov commented Jun 6, 2021 • edited Loading

fmassa commented Jun 7, 2021 • edited Loading

vadimkantorov commented Jun 7, 2021

fmassa commented Jun 7, 2021

vadimkantorov commented Jun 7, 2021 • edited Loading

fmassa commented Sep 8, 2021

vadimkantorov commented Sep 8, 2021

fmassa commented Sep 8, 2021 • edited Loading

vadimkantorov commented Sep 8, 2021 • edited Loading

fmassa commented Sep 8, 2021

vadimkantorov commented Sep 8, 2021 • edited Loading

fmassa commented Sep 8, 2021 • edited Loading

vadimkantorov commented Sep 8, 2021 • edited Loading

fmassa commented Sep 8, 2021

vadimkantorov commented Sep 8, 2021 • edited Loading

vadimkantorov commented Oct 18, 2021

vadimkantorov commented Jan 31, 2021 •

edited

Loading

vadimkantorov commented Feb 1, 2021 •

edited

Loading

vadimkantorov commented Feb 10, 2021 •

edited

Loading

vadimkantorov commented Feb 10, 2021 •

edited

Loading

vadimkantorov commented Jun 6, 2021 •

edited

Loading

fmassa commented Jun 7, 2021 •

edited

Loading

vadimkantorov commented Jun 7, 2021 •

edited

Loading

fmassa commented Sep 8, 2021 •

edited

Loading

vadimkantorov commented Sep 8, 2021 •

edited

Loading

vadimkantorov commented Sep 8, 2021 •

edited

Loading

fmassa commented Sep 8, 2021 •

edited

Loading

vadimkantorov commented Sep 8, 2021 •

edited

Loading

vadimkantorov commented Sep 8, 2021 •

edited

Loading