Add Rendered sst2 dataset #5220

jdsgomes · 2022-01-19T14:24:34Z

Addresses #5108

cc @pmeier @NicolasHug

This reverts commit 31fadbe.

This reverts commit 4e3d900.

facebook-github-bot · 2022-01-19T14:24:41Z

💊 CI failures summary and remediations

As of commit 39b2441 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base f670152 since Jan 14

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet.

unittest_prototype since Jan 14 (adf8466)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug

Thanks @jdsgomes , this looks great! I only have very minor comments below. I'll approve now, perhaps @pmeier can give this a quick look too?

NicolasHug · 2022-01-19T14:57:56Z

torchvision/datasets/rendered_sst2.py

+        print(self._labels)
+        print(self._image_files)


oops :p

Suggested change

print(self._labels)

print(self._image_files)

torchvision/datasets/rendered_sst2.py

NicolasHug · 2022-01-19T15:03:24Z

torchvision/datasets/rendered_sst2.py

+        for p in (self._base_folder / self._split).glob("**/*.png"):
+            self._labels.append(self.class_to_idx[p.parent.name])
+            self._image_files.append(p)


I think this is something that make_dataset() could be used for. But the code here is very simple so IMHO it's fine to keep as-is (perhaps @pmeier can share his thoughts).

Yeah, make_dataset could make this even shorter:

Either

self._image_files, self._labels = zip(*make_dataset(str(self._base_folder / self._split)))

or

self._samples = make_dataset(str(self._base_folder / self._split))

and do

image_file, label = self._samples[idx]

in __getitem__.

No strong opinion, but if we go for make_dataset, I would prefer the latter option.

I took care of that in #5164 !

NicolasHug · 2022-01-19T15:05:00Z

torchvision/datasets/rendered_sst2.py

+                (self._base_folder / self._split / class_label).exists()
+                and (self._base_folder / self._split / class_label).is_dir()


Nit: I think that is_dir() properly returns False when the directory does not exist, so perhaps we can avoid using exists():

In [1]: from pathlib import Path In [2]: Path("alfjnaljefeajlfbaeljnaljen").is_dir() Out[2]: False

NicolasHug · 2022-01-19T15:06:14Z

test/test_datasets.py

+        root_folder = pathlib.Path(tmpdir) / "rendered-sst2"
+        image_folder = root_folder / config["split"]
+
+        num_images_per_class = 5


To slightly increase robustness:

Suggested change

num_images_per_class = 5

num_images_per_class = {"train": 5, "test": 6, "val": 7}

NicolasHug · 2022-01-19T15:06:29Z

test/test_datasets.py

@@ -2665,5 +2665,27 @@ def inject_fake_data(self, tmpdir: str, config):
        return num_images


+class RenderedSST2TestCase(datasets_utils.ImageDatasetTestCase):
+    DATASET_CLASS = datasets.RenderedSST2
+    FEATURE_TYPES = (PIL.Image.Image, int)


Nit: we don't need this line as it's the default of the datasets_utils.ImageDatasetTestCase class

torchvision/datasets/rendered_sst2.py

NicolasHug · 2022-01-20T10:20:10Z

Thanks a lot @jdsgomes !

Summary: * Adding multiweight support for shufflenetv2 prototype models * Revert "Adding multiweight support for shufflenetv2 prototype models" This reverts commit 31fadbe. * Adding multiweight support for shufflenetv2 prototype models * Revert "Adding multiweight support for shufflenetv2 prototype models" This reverts commit 4e3d900. * Add RenderedSST2 dataset * Address PR comments * Fix bug in dataset verification Reviewed By: jdsgomes, prabhat00155 Differential Revision: D33739391 fbshipit-source-id: b9d64694e115db08a07c08763ab8c5a18421f6d2 Co-authored-by: Nicolas Hug <[email protected]>

jdsgomes and others added 10 commits October 29, 2021 10:32

Adding multiweight support for shufflenetv2 prototype models

31fadbe

Revert "Adding multiweight support for shufflenetv2 prototype models"

1e578b7

This reverts commit 31fadbe.

Merge branch 'pytorch:main' into main

85e4429

Adding multiweight support for shufflenetv2 prototype models

4e3d900

Revert "Adding multiweight support for shufflenetv2 prototype models"

615b612

This reverts commit 4e3d900.

Merge branch 'pytorch:main' into main

a0bbece

Merge branch 'pytorch:main' into main

ba966f4

Merge branch 'pytorch:main' into main

6cdd49b

Merge branch 'pytorch:main' into main

d4f1638

Add RenderedSST2 dataset

069bba4

pytorch-probot bot added the ciflow/default label Jan 19, 2022

facebook-github-bot added the cla signed label Jan 19, 2022

Merge branch 'main' into rendered-sst2-dataset

409dcad

NicolasHug approved these changes Jan 19, 2022

View reviewed changes

jdsgomes added 2 commits January 19, 2022 17:14

Address PR comments

78f5e45

Fix bug in dataset verification

e6c95ad

jdsgomes mentioned this pull request Jan 19, 2022

New classification datasets support for FLAVA #5108

Closed

14 tasks

Merge branch 'main' into rendered-sst2-dataset

39b2441

NicolasHug merged commit e32b19e into pytorch:main Jan 20, 2022

NicolasHug added module: datasets new feature labels Jan 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Rendered sst2 dataset #5220

Add Rendered sst2 dataset #5220

jdsgomes commented Jan 19, 2022 •

edited

Loading

facebook-github-bot commented Jan 19, 2022 •

edited

Loading

NicolasHug left a comment

NicolasHug Jan 19, 2022

NicolasHug Jan 19, 2022

pmeier Jan 20, 2022

NicolasHug Jan 20, 2022

NicolasHug Jan 19, 2022

NicolasHug Jan 19, 2022

NicolasHug Jan 19, 2022

NicolasHug commented Jan 20, 2022

		(self._base_folder / self._split / class_label).exists()
		and (self._base_folder / self._split / class_label).is_dir()

	num_images_per_class = 5
	num_images_per_class = {"train": 5, "test": 6, "val": 7}

Add Rendered sst2 dataset #5220

Add Rendered sst2 dataset #5220

Conversation

jdsgomes commented Jan 19, 2022 • edited Loading

facebook-github-bot commented Jan 19, 2022 • edited Loading

💊 CI failures summary and remediations

🚧 1 ongoing upstream failure:

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Jan 19, 2022

Choose a reason for hiding this comment

NicolasHug Jan 19, 2022

Choose a reason for hiding this comment

pmeier Jan 20, 2022

Choose a reason for hiding this comment

NicolasHug Jan 20, 2022

Choose a reason for hiding this comment

NicolasHug Jan 19, 2022

Choose a reason for hiding this comment

NicolasHug Jan 19, 2022

Choose a reason for hiding this comment

NicolasHug Jan 19, 2022

Choose a reason for hiding this comment

NicolasHug commented Jan 20, 2022

jdsgomes commented Jan 19, 2022 •

edited

Loading

facebook-github-bot commented Jan 19, 2022 •

edited

Loading