-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add GTSRB dataset to prototypes #5214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+337
−34
Merged
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
237a707
Change default of download for Food101 and DTD
NicolasHug bc3be4e
WIP
NicolasHug 85ca229
Merge branch 'main' of github.com:pytorch/vision into defaultdownload
NicolasHug 87695d4
Set download default to False and put it at the end
NicolasHug 1e6e37d
Keep stuff private
NicolasHug 474546f
GTSRB: train -> split. Also use pathlib
NicolasHug a38a18b
mypy
NicolasHug d58ef16
Remove split and partition for SUN397
NicolasHug 5061141
mypy
NicolasHug 6c02cff
mypy
NicolasHug d3cb34f
Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype
NicolasHug d288c6c
Merge branch 'defaultdownload' into gtsrb_prototype
NicolasHug 521b75c
WIP
NicolasHug 1c1ceb0
WIP
NicolasHug 1b2ee27
Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype
NicolasHug 4fdb976
WIP
NicolasHug a6ae4c4
Add tests
NicolasHug 761e5d7
Add some types
NicolasHug 1dd6efe
lmao mypy you funny lad
NicolasHug a32ab88
fix unpacking
NicolasHug 862187a
Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype
NicolasHug e487828
Use DictWriter
NicolasHug 8f15cc3
Hardcode categories since they are just ints in [0, 42]
NicolasHug 9ac22d3
Split URL root
NicolasHug 1f1fa35
Use name instead of stem
NicolasHug f25a83a
Add category to labels, and fix dict reading
NicolasHug 52ec648
Use path_comparator
NicolasHug 379876f
Use buffer_size=1
NicolasHug 632c212
Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype
NicolasHug 0d6b58d
Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype
NicolasHug e26b456
Use Zipper instead of IterKeyZipper
NicolasHug b958b6b
mypy
NicolasHug 06c0904
Some more instructions
NicolasHug 18b87e2
forgot backquotes
NicolasHug 44bb8f1
Apply suggestions from code review
NicolasHug c1ec16d
gt -> ground_truth
NicolasHug ff78c70
e -> sample
NicolasHug cd38e25
Add support for bboxes
NicolasHug 1e8aea6
Update torchvision/prototype/datasets/_builtin/gtsrb.py
NicolasHug 8e9a617
format
NicolasHug 6703710
Remove unused method
NicolasHug 6b67ce7
Add test for label matching
NicolasHug 1ef84e0
Update test/test_prototype_builtin_datasets.py
NicolasHug 8283332
Merge branch 'main' into gtsrb_prototype
NicolasHug File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmeier LMK what you think of this.
Down below I set
buffer_size=1
in the Zipper, because in the original .zip files, both theimage_dp
and thegt_dp
are fully aligned: they both contain images 00001, 00002, etc. in this order. So I'm assuming thatbuffer_size=1
is better thanbuffer_size=UNLIMITED
?Without this call to
sorted()
, the tests would fail: the .zip archive created bymake_zip
would contain the files in a shuffled order (becausefiles
is a set), and soimage_dp
and thegt_dp
would not be aligned anymore, leading to a failure to match keys in the Zipper. (Note: this is only a problem in the tests; the code works fine otherwise on my custom script iterating over the dataset).I hope this won't make other tests fails. This might not be a problem that we have right now, but perhaps something to keep in mind for the future: we might need the test archives to exactly match the order of the "original" archives.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I didn't think of that. Do you know if the order of the returned paths of
pathlib.Path.glob()
is stable? If yes, we could simply replace the sets in_split_files_or_dirs
with lists instead of sorting here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about
Path.glob()
. I know thatglob.glob
has no guaranteed order, but I don't thinkPath.glob()
relies on it. Maybe the safest is to not assume a specific order.BTW, slightly related, what was the reason to use sets instead of lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It think the reason was to avoid duplicates, but I don't remember if there was a case where I hit something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried using lists rather than sorting afterwards? If CI is not complaining for other datasets, I feel like that would be the better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw this call to
remove()
which might the reason for using sets:vision/test/datasets_utils.py
Lines 862 to 863 in afda28a
I can still switch to lists if you'd like, I guess I would have to write something like
LMK which one you prefer