-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add Rendered sst2 dataset #5220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 39b2441 (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
🚧 1 ongoing upstream failure:These were probably caused by upstream breakages that are not fixed yet.
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(self._labels) | ||
print(self._image_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops :p
print(self._labels) | |
print(self._image_files) |
for p in (self._base_folder / self._split).glob("**/*.png"): | ||
self._labels.append(self.class_to_idx[p.parent.name]) | ||
self._image_files.append(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something that make_dataset()
could be used for. But the code here is very simple so IMHO it's fine to keep as-is (perhaps @pmeier can share his thoughts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, make_dataset
could make this even shorter:
Either
self._image_files, self._labels = zip(*make_dataset(str(self._base_folder / self._split)))
or
self._samples = make_dataset(str(self._base_folder / self._split))
and do
image_file, label = self._samples[idx]
in __getitem__
.
No strong opinion, but if we go for make_dataset
, I would prefer the latter option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took care of that in #5164 !
(self._base_folder / self._split / class_label).exists() | ||
and (self._base_folder / self._split / class_label).is_dir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think that is_dir()
properly returns False when the directory does not exist, so perhaps we can avoid using exists()
:
In [1]: from pathlib import Path
In [2]: Path("alfjnaljefeajlfbaeljnaljen").is_dir()
Out[2]: False
test/test_datasets.py
Outdated
root_folder = pathlib.Path(tmpdir) / "rendered-sst2" | ||
image_folder = root_folder / config["split"] | ||
|
||
num_images_per_class = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To slightly increase robustness:
num_images_per_class = 5 | |
num_images_per_class = {"train": 5, "test": 6, "val": 7} |
test/test_datasets.py
Outdated
@@ -2665,5 +2665,27 @@ def inject_fake_data(self, tmpdir: str, config): | |||
return num_images | |||
|
|||
|
|||
class RenderedSST2TestCase(datasets_utils.ImageDatasetTestCase): | |||
DATASET_CLASS = datasets.RenderedSST2 | |||
FEATURE_TYPES = (PIL.Image.Image, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we don't need this line as it's the default of the datasets_utils.ImageDatasetTestCase
class
Thanks a lot @jdsgomes ! |
Summary: * Adding multiweight support for shufflenetv2 prototype models * Revert "Adding multiweight support for shufflenetv2 prototype models" This reverts commit 31fadbe. * Adding multiweight support for shufflenetv2 prototype models * Revert "Adding multiweight support for shufflenetv2 prototype models" This reverts commit 4e3d900. * Add RenderedSST2 dataset * Address PR comments * Fix bug in dataset verification Reviewed By: jdsgomes, prabhat00155 Differential Revision: D33739391 fbshipit-source-id: b9d64694e115db08a07c08763ab8c5a18421f6d2 Co-authored-by: Nicolas Hug <[email protected]>
Addresses #5108
cc @pmeier @NicolasHug