Similarity learning reference code #1101

dakshjotwani · 2019-07-08T15:14:40Z

@fmassa PR for embedding learning reference code as discussed in #1042. I decided not to create a VGGFace2 dataset for now, since that would require more thought and planning. For now I'm using FMNIST.

codecov-io · 2019-07-08T15:34:17Z

Codecov Report

Merging #1101 into master will increase coverage by 0.38%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1101      +/-   ##
==========================================
+ Coverage   64.57%   64.95%   +0.38%     
==========================================
  Files          68       68              
  Lines        5411     5413       +2     
  Branches      831      835       +4     
==========================================
+ Hits         3494     3516      +22     
+ Misses       1665     1641      -24     
- Partials      252      256       +4

Impacted Files	Coverage Δ
torchvision/models/detection/roi_heads.py	`55.93% <0%> (-0.97%)`	⬇️
torchvision/ops/boxes.py	`94.73% <0%> (ø)`	⬆️
torchvision/transforms/transforms.py	`81.53% <0%> (+0.98%)`	⬆️
torchvision/datasets/fakedata.py	`26.92% <0%> (+3.58%)`	⬆️
torchvision/datasets/svhn.py	`67.3% <0%> (+32.69%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3483342...3cfc08b. Read the comment docs.

fmassa

Thanks a lot for the PR!

I have made a few comments.

Also, is FashionMNIST a dataset that is generally used for embedding learning, or you just took it for an example? I really think that we should focus on some more realistic dataset, but we can change that in the future.

Also, it might be good adding a basic README explaining what the reference script is meant to do, so that users know what they are looking for. This is something that is missing for the other reference scripts, but I should add those in the future.

references/embedding/loss.py

references/embedding/sampler.py

references/embedding/train.py

references/embedding/model.py

references/embedding/train.py

fmassa · 2019-07-09T09:36:50Z

references/embedding/train.py

+    train_dataset = FashionMNIST(args.train_data, train=True, transform=transform, download=True)
+    test_dataset = FashionMNIST(args.test_data, train=False, transform=transform, download=True)
+
+    targets = train_dataset.targets.tolist()


A comment here would be helpful, as this is generally something that the user will need to change if they change the dataset

I added comments mentioning that any classification dataset should be fine here as long as targets is constructed as described. Is that sufficient?

The comment saying that it should be any classification dataset is misleading, because not all datasets have the .targets attribute, even if they are classification datasets. Maybe just check that the dataset has a targets attribute, and raise a nice error message if not?

dakshjotwani · 2019-07-09T15:48:09Z

@fmassa I have made all the requested changes, other than the helper method change (I'm not sure which section you wanted to make a method).

fmassa

This is looking very good, thanks!

I think some of the parts of this code (like the samplers) will be great to be moved to torchvision in the future, once we figure out where to put it. But for that, tests would be necessary.

I've a couple more comments, but I think this is almost ready to merge, thanks!

references/embedding/train.py

fmassa · 2019-07-10T09:09:08Z

references/embedding/train.py

+    train_dataset = FashionMNIST(args.train_data, train=True, transform=transform, download=True)
+    test_dataset = FashionMNIST(args.test_data, train=False, transform=transform, download=True)
+
+    targets = train_dataset.targets.tolist()


The comment saying that it should be any classification dataset is misleading, because not all datasets have the .targets attribute, even if they are classification datasets. Maybe just check that the dataset has a targets attribute, and raise a nice error message if not?

fmassa · 2019-07-10T09:11:48Z

references/embedding/sampler.py

+        self.k = k
+        self.groups = create_groups(groups, self.k)
+
+    def __iter__(self):


This is ok as is because we are not yet adding samplers to the main library, but once we move it to the torchvision package, it would be good to have tests for it.

If you could write a basic test now checking the behavior (with dummy data), it would make it much easier for moving this to torchvision core later on.

@fmassa I have written the test cases for the sampler and refactored the accuracy section into another method.

I don't think we should assume that a targets attribute exists. Instead I feel like I can explain the targets data structure better, so that users can construct targets accordingly (or use the targets attribute if it exists). I have changed the comments slightly, explaining what is expected of the targets variable. Will this be okay instead?

dakshjotwani · 2019-07-12T09:16:19Z

@fmassa I have made the changes. Instead of expecting a targets attribute from the dataset, I elaborated further on the semantics and requirements from the targets data structure, which users can build during or after they have initialized their dataset. Will that be ok?

…ure)

dakshjotwani · 2019-07-16T14:24:58Z

Renamed embedding to similarity to be more consistent with existing literature. Both are used, but similarity is more common.

fmassa

LGTM, thanks a lot!

dakshjotwani added 4 commits July 5, 2019 22:49

Add loss, sampler, and train script

3915b99

Fix train script

ebacb9c

Add argparse

2748785

Fix lint

0ca670f

Change f strings to .format()

fc795e3

fmassa requested changes Jul 9, 2019

View reviewed changes

dakshjotwani added 9 commits July 9, 2019 16:31

Remove unused imports

21e6684

Change TripletMarginLoss to extend nn.Module

f057c24

Load eye uint8 tensors directly on device

db795ac

Refactor model.py to backbone=None

a7eeebe

Add docstring for PKSampler

5955e5e

Refactor evaluate() to take loader as arg instead

98e78bc

Change eval method to cat embeddings all at once

4979d92

Add dataset comments

ee3403c

Add README.md

1f35628

fmassa reviewed Jul 10, 2019

View reviewed changes

dakshjotwani added 4 commits July 10, 2019 17:56

Add tests for sampler

a4d774d

Refactor threshold finder to helper method

549aa90

Refactor targets comment

c0f4dc1

Fix lint

a03ff20

dakshjotwani changed the title ~~Embedding learning reference code~~ Similarity learning reference code Jul 16, 2019

Rename embedding to similarity (More consistent with existing literat…

3cfc08b

…ure)

fmassa approved these changes Jul 17, 2019

View reviewed changes

fmassa merged commit bbd363c into pytorch:master Jul 17, 2019

Similarity learning reference code #1101

Similarity learning reference code #1101

Uh oh!

Conversation

dakshjotwani commented Jul 8, 2019

Uh oh!

codecov-io commented Jul 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fmassa Jul 9, 2019

Choose a reason for hiding this comment

Uh oh!

dakshjotwani Jul 9, 2019

Choose a reason for hiding this comment

Uh oh!

fmassa Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

dakshjotwani commented Jul 9, 2019

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fmassa Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

fmassa Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

dakshjotwani Jul 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakshjotwani commented Jul 12, 2019

Uh oh!

dakshjotwani commented Jul 16, 2019

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-io commented Jul 8, 2019 •

edited

Loading

dakshjotwani Jul 10, 2019 •

edited

Loading