Add support for PCAM dataset #5203

NicolasHug · 2022-01-17T14:16:26Z

Towards #5108

This PR adds support for the PCAM dataset.

cc @pmeier

facebook-github-bot · 2022-01-17T14:16:33Z

💊 CI failures summary and remediations

As of commit 8a3dd39 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 5e56575 since Jan 14

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet.

unittest_prototype since Jan 14 (adf8466)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug · 2022-01-17T14:19:31Z

torchvision/datasets/pcam.py

+
+    def __len__(self) -> int:
+        images_file = self._FILES[self._split]["images"][0]
+        with self.h5py.File(self._base_folder / images_file) as images_data:


Note for here and below: opening a File does not load its data into memory, so the operation is very cheap and fast.

Similarly below accessing a single row in the file will not load the entire file, just a specific section of it.

I guess we could open the files and keep the handles in __init__, but I'm not sure it would be any faster, and we might not be able to ever close the handles properly.

NicolasHug · 2022-01-17T14:21:57Z

torchvision/datasets/pcam.py

+    _FILES = {
+        "train": {
+            "images": (
+                "camelyonpatch_level_2_split_train_x.h5",  # Data file name
+                "1Ka0XfEMiwgCYPdTI-vv6eUElOBnKFKQ2",  # Google Drive ID
+                "1571f514728f59376b705fc836ff4b63",  # md5 hash
+            ),


I'm not ecstatic about this big dict, but I needed everything in the same place to support a per-split download logic (i.e. only download the test data if we don't need train nor val).

pmeier

Thanks @NicolasHug! I have a few minor nits inline. Otherwise LGTM when CI is green.

pmeier · 2022-01-17T14:17:51Z

torchvision/datasets/oxford_iiit_pet.py

+        download (bool, optional): If True, downloads the dataset from the internet and puts it into
+            ``root/oxford-iiit-pet``. If dataset is already downloaded, it is not downloaded again.


torchvision/datasets/pcam.py

Co-authored-by: Philip Meier <[email protected]>

Summary: * Add support for PCAM dataset * mypy * Apply suggestions from code review * Remove classes and class_to_idx attributes * Use _decompress Reviewed By: datumbox, NicolasHug Differential Revision: D33655258 fbshipit-source-id: a38e55340ab3c364969160f3c186d1a130bdc371 Co-authored-by: Philip Meier <[email protected]> Co-authored-by: Philip Meier <[email protected]>

Add support for PCAM dataset

a1c7744

NicolasHug added module: datasets new feature labels Jan 17, 2022

facebook-github-bot added the cla signed label Jan 17, 2022

pytorch-probot bot added the ciflow/default label Jan 17, 2022

NicolasHug mentioned this pull request Jan 17, 2022

New classification datasets support for FLAVA #5108

Closed

14 tasks

NicolasHug commented Jan 17, 2022

View reviewed changes

pmeier approved these changes Jan 17, 2022

View reviewed changes

NicolasHug and others added 5 commits January 17, 2022 14:38

mypy

8a0dfb4

Apply suggestions from code review

3ba4d82

Co-authored-by: Philip Meier <[email protected]>

Remove classes and class_to_idx attributes

95044d6

Use _decompress

f95f64e

Merge branch 'main' of github.com:pytorch/vision into pcam

8a3dd39

NicolasHug merged commit 4946827 into pytorch:main Jan 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for PCAM dataset #5203

Add support for PCAM dataset #5203

Uh oh!

NicolasHug commented Jan 17, 2022 •

edited by pytorch-probot bot

Loading

Uh oh!

facebook-github-bot commented Jan 17, 2022 •

edited

Loading

Uh oh!

NicolasHug Jan 17, 2022

Uh oh!

NicolasHug Jan 17, 2022

Uh oh!

pmeier left a comment •

edited

Loading

Uh oh!

pmeier Jan 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		download (bool, optional): If True, downloads the dataset from the internet and puts it into
		``root/oxford-iiit-pet``. If dataset is already downloaded, it is not downloaded again.

Add support for PCAM dataset #5203

Add support for PCAM dataset #5203

Uh oh!

Conversation

NicolasHug commented Jan 17, 2022 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🚧 1 ongoing upstream failure:

Uh oh!

NicolasHug Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmeier Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Jan 17, 2022 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Jan 17, 2022 •

edited

Loading

pmeier left a comment •

edited

Loading