Skip to content

Feature Pyramid Network code bug #2326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cl2227619761 opened this issue Jun 17, 2020 · 7 comments · Fixed by #2954
Closed

Feature Pyramid Network code bug #2326

cl2227619761 opened this issue Jun 17, 2020 · 7 comments · Fixed by #2954

Comments

@cl2227619761
Copy link

for m in self.children():

this line does not work, I think it should be modified as self.modules() instead of self.children()

@pmeier
Copy link
Collaborator

pmeier commented Jun 17, 2020

Hi @cl2227619761, what do you mean by

this line does not work

?

Please post a minimal example where you encounter an issue with this line.

@fmassa
Copy link
Member

fmassa commented Jun 17, 2020

Hi @cl2227619761

That is a very good catch!

@pmeier the issue is that self.children() returns the immediate modules, so in this case it would be inner_blocks and layer_blocks, none of which are nn.Conv2d.
The fix is what @cl2227619761 mentioned: replace self.children() with self.modules().
But this brings the question on the impact of the initialization. If I remember correctly, we needed the custom initialization in order to get best results, so I'm surprised that this was left out.

A PR fixing this would be great, but it would also be great if we could measure the impact of this incorrect initialization in the model performance.

@mthrok if you find the time, it would be great to assess how much this mistake affects performance.

@pmeier
Copy link
Collaborator

pmeier commented Jun 17, 2020

My bad, I misunderstood the comment above the loop:

# initialize parameters now to avoid modifying the initialization of top_blocks
for m in self.children():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_uniform_(m.weight, a=1)
nn.init.constant_(m.bias, 0)

@gan3sh500
Copy link

Can I help with this? If you point me to what model/scripts you used to train I can train with the correct init and the current one.

@pmeier
Copy link
Collaborator

pmeier commented Jun 21, 2020

@gan3sh500 You can find the training scripts here.

@gan3sh500
Copy link

I’ll have to train on single 2080ti. I’ve seen smaller batch size create worse convergence but is it fine if I compare with both the kaiming init and otherwise trained with same conditions?

@fmassa
Copy link
Member

fmassa commented Jul 30, 2020

@gan3sh500 sorry for the delay in replying.

If you change the number of GPUs, you'll need to adapt the learning rate to follow the linear scaling rule -- if you divide the global batch size by 8x, you should divide the learning rate by 8x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants