-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Cleanups for FLAVA datasets #5164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 3c70d81 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Thanks for the review @pmeier . Following up on your #5130 (comment), let's use this PR to
I'll mark it as draft and we can come back to this once the rest of the PRs are merged. EDIT: "all datasets" == all datasets that haven't been released yet. |
@NicolasHug You are only talking about the "FLAVA" datasets here, right? Because for other datasets that would be BC breaking and I want to avoid that, since we probably don't have time for a deprecation cycle before the API is deprecated in general. |
Fully agreed @pmeier , sorry for not being clearer in my comment above |
vision/torchvision/datasets/pcam.py Line 78 in 4946827
is an anti-pattern. Since Lines 117 to 119 in 4946827
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment inline. Plus let's also resolve #5220 (comment). Otherwise, LGTM if CI is green! Thanks @NicolasHug
for p in (self._base_folder / self._split_to_folder[self._split]).glob("**/*.png"): | ||
self._labels.append(self.class_to_idx[p.parent.name]) | ||
self._image_files.append(p) | ||
self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions="png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions="png") | |
self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions=(".png",)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it ultimately comes down to endswith
which accepts tuples but also just plain strings.
I think the type annotations are incorrect here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me send a PR to fix that.
Failure is unrelated, I'll merge. Thanks for the review! |
Summary: * Change default of download for Food101 and DTD * Set download default to False and put it at the end * Keep stuff private * GTSRB: train -> split. Also use pathlib * mypy * Remove split and partition for SUN397 * mypy * mypy * move download param for SST2 * Use make_dataset in SST2 * Use a base URL for GTSRB * Let's make this code more complictaed than it needs to be because why not Reviewed By: jdsgomes, prabhat00155 Differential Revision: D33739381 fbshipit-source-id: a2bcfcdc2296ffe62f8e75c8107ff1d0a87957f1
Towards the end of #5108
All datasets but 2 havedownload=False
as the default, so this PR sets the default to False as well forFood101
andDTD
for consistency. It also documents thedownload
parameter forFood101
which was missing from the Docstring.See #5164 (comment) for complete set of changes
cc @pmeier