Skip to content

Add Rendered sst2 dataset #5220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 20, 2022
Merged

Conversation

jdsgomes
Copy link
Contributor

@jdsgomes jdsgomes commented Jan 19, 2022

Addresses #5108

cc @pmeier @NicolasHug

@facebook-github-bot
Copy link

facebook-github-bot commented Jan 19, 2022

💊 CI failures summary and remediations

As of commit 39b2441 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet.


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jdsgomes , this looks great! I only have very minor comments below. I'll approve now, perhaps @pmeier can give this a quick look too?

Comment on lines 58 to 59
print(self._labels)
print(self._image_files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops :p

Suggested change
print(self._labels)
print(self._image_files)

Comment on lines 55 to 57
for p in (self._base_folder / self._split).glob("**/*.png"):
self._labels.append(self.class_to_idx[p.parent.name])
self._image_files.append(p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something that make_dataset() could be used for. But the code here is very simple so IMHO it's fine to keep as-is (perhaps @pmeier can share his thoughts).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make_dataset could make this even shorter:

Either

self._image_files, self._labels = zip(*make_dataset(str(self._base_folder / self._split)))

or

self._samples = make_dataset(str(self._base_folder / self._split))

and do

image_file, label = self._samples[idx]

in __getitem__.

No strong opinion, but if we go for make_dataset, I would prefer the latter option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took care of that in #5164 !

Comment on lines 82 to 83
(self._base_folder / self._split / class_label).exists()
and (self._base_folder / self._split / class_label).is_dir()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think that is_dir() properly returns False when the directory does not exist, so perhaps we can avoid using exists():

In [1]: from pathlib import Path

In [2]: Path("alfjnaljefeajlfbaeljnaljen").is_dir()
Out[2]: False

root_folder = pathlib.Path(tmpdir) / "rendered-sst2"
image_folder = root_folder / config["split"]

num_images_per_class = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To slightly increase robustness:

Suggested change
num_images_per_class = 5
num_images_per_class = {"train": 5, "test": 6, "val": 7}

@@ -2665,5 +2665,27 @@ def inject_fake_data(self, tmpdir: str, config):
return num_images


class RenderedSST2TestCase(datasets_utils.ImageDatasetTestCase):
DATASET_CLASS = datasets.RenderedSST2
FEATURE_TYPES = (PIL.Image.Image, int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we don't need this line as it's the default of the datasets_utils.ImageDatasetTestCase class

@NicolasHug
Copy link
Member

Thanks a lot @jdsgomes !

@NicolasHug NicolasHug merged commit e32b19e into pytorch:main Jan 20, 2022
facebook-github-bot pushed a commit that referenced this pull request Jan 26, 2022
Summary:
* Adding multiweight support for shufflenetv2 prototype models

* Revert "Adding multiweight support for shufflenetv2 prototype models"

This reverts commit 31fadbe.

* Adding multiweight support for shufflenetv2 prototype models

* Revert "Adding multiweight support for shufflenetv2 prototype models"

This reverts commit 4e3d900.

* Add RenderedSST2 dataset

* Address PR comments

* Fix bug in dataset verification

Reviewed By: jdsgomes, prabhat00155

Differential Revision: D33739391

fbshipit-source-id: b9d64694e115db08a07c08763ab8c5a18421f6d2

Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants