Add SUN397 Dataset #5132

saswatpp · 2021-12-27T19:22:36Z

Adds SUN397 dataset to address: #5108.

The size of the dataset was 39GB and a download option was added. Also, 10 official training and testing partitions can be loaded using the partition argument (valid values : int from 1-10 or None for the entire data). Is it a helpful ? @pmeier Official Partitions

Thank you.

cc @pmeier

facebook-github-bot · 2021-12-27T19:22:42Z

💊 CI failures summary and remediations

As of commit 69ee35e (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

pmeier

Hey @saswatpp and thanks a lot for the PR! I have some comments inline that make the implementation (hopefully) a little cleaner. On top of that, you need to run our auto-formatters on your code to pass CI. Have a look at our contributing guide. In short, you can do the following:

pip install pre-commit
pre-commit install  # activates the auto-formatter for each subsequent commit
pre-commit run --all-files  # only needed once, since you already commited
git commit -am "fix code format"

Also, I couldn't understand what exactly the tests for datasets does just by looking at the code. Do we have some resources for that ?

The test for the datasets are located in test/test_datasets.py. From the contributor perspective you need to add a new test case that configures some basics and provides fake data.

class SUN397TestCase(datasets_utils.ImageDatasetTestCase):
    DATASET_CLASS = datasets.SUN397

    ADDITIONAL_CONFIGS = datasets_utils.combinations_grid(
        split=("train", "test"),
        # There is no need to test all individual partitions, since they all behave the same
        partition=(1, 10, "all"),
    )

    def inject_fake_data(tmp_dir, config):
        ...

inject_fake_data should prepare tmp_dir (in your case the same as root in the dataset) in a way that the structure of the files is equal to what would be there if I had instantiated the dataset with download=True. It needs to return the number of samples in the dataset. If you want to know more, you can read the documentation of the underlying test case.

torchvision/datasets/sun397.py

saswatpp · 2021-12-30T18:36:46Z

@pmeier Can you tell me how to run tests for specific dataset ? thank you

pmeier · 2021-12-30T19:01:25Z

The easiest would be to run

$ pytest test/test_datasets.py -k sun397

if the test case you wrote contains sun397 (case-insenstive) in the name.

saswatpp · 2022-01-01T15:32:14Z

@pmeier, i had mentioned FEATURE_TYPES = (PIL.Image.Image, int) in the test class but the unittest still fails. Do you know what is going on ?

pmeier

Hey @saswatpp, I've suggested one change to the label generation. With that mypy is happy and I can't reproduce the test error locally. Let's see if CI is happy to.

torchvision/datasets/sun397.py

test/test_datasets.py

torchvision/datasets/sun397.py

test/test_datasets.py

torchvision/datasets/sun397.py

…dataset

pmeier

Thanks a lot @saswatpp for the PR. LGTM!

torchvision/datasets/sun397.py

NicolasHug

Thanks a lot for the PR @saswatpp and @pmeier for the review. I made a few comments below, the download() function needs some minor fixes but other than that it looks great. LMK what you think.

torchvision/datasets/sun397.py

pmeier · 2022-01-06T16:26:08Z

@saswatpp Could you fix the merge conflicts? Afterwards this should be good to go.

NicolasHug · 2022-01-06T16:28:07Z

Afterwards this should be good to go

I'm wondering whether @saswatpp pushed their changes yet? I see all comments as resolved but I don't see any new changes in the diff 😅

saswatpp · 2022-01-06T16:50:34Z

oh @NicolasHug I have made the changes in local repo gotta commit them only 😓

NicolasHug · 2022-01-06T16:57:04Z

No worries at all @saswatpp !

saswatpp · 2022-01-06T17:33:03Z

git was behaving weird, so force pushed

torchvision/datasets/sun397.py

NicolasHug

Thanks a lot @saswatpp !

Running the tests locally, I'm seeing a lot of warnings about unclosed files:

test/test_datasets.py::LSUNTestCase::test_transforms
  /Users/nicolashug/dev/vision/torchvision/datasets/lsun.py:31: ResourceWarning: unclosed file <_io.BufferedWriter name='_cache_varfolderssyvkyzpyhrlqqcgnTtmpclcnsatowervallmdb'>
    pickle.dump(self.keys, open(cache_file, "wb"))

This is simlar to what happened in #5116 (review), perhaps @pmeier can advise on how he fixed it over there?

Other than that, LGTM!

pmeier · 2022-01-07T08:36:12Z

@NicolasHug As the error message implies, this is about files of the LSUN dataset and thus unrelated to this PR. The same warnings are present on the main branch. I'll look into it.

github-actions · 2022-01-07T08:37:08Z

Hey @pmeier!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: * dataset class added * fix code format * fixed requested changes * fixed issues in sun397 * Update torchvision/datasets/sun397.py Reviewed By: sallysyw Differential Revision: D33479277 fbshipit-source-id: 374d098c261adeacd073fae141380130a6c3aa95 Co-authored-by: Nicolas Hug <[email protected]> Co-authored-by: Philip Meier <[email protected]>

dataset class added

c742860

pytorch-probot bot added the ciflow/default label Dec 27, 2021

facebook-github-bot added the cla signed label Dec 27, 2021

pmeier requested changes Dec 28, 2021

View reviewed changes

pmeier mentioned this pull request Dec 28, 2021

New classification datasets support for FLAVA #5108

Closed

14 tasks

fix code format

66ddf2f

pmeier reviewed Jan 3, 2022

View reviewed changes

torchvision/datasets/sun397.py Outdated Show resolved Hide resolved

torchvision/datasets/sun397.py Outdated Show resolved Hide resolved

test/test_datasets.py Outdated Show resolved Hide resolved

pmeier reviewed Jan 3, 2022

View reviewed changes

saswatpp added 2 commits January 3, 2022 22:35

fixed requested changes

636de77

Merge branch 'main' of https://github.com/pytorch/vision into sun397-…

8704eab

…dataset

pmeier approved these changes Jan 5, 2022

View reviewed changes

torchvision/datasets/sun397.py Outdated Show resolved Hide resolved

pmeier requested a review from NicolasHug January 5, 2022 07:37

saswatpp changed the title ~~[WIP] Add SUN397 Dataset~~ Add SUN397 Dataset Jan 5, 2022

NicolasHug reviewed Jan 5, 2022

View reviewed changes

fixed issues in sun397

61b4f05

resolve merge conflicts

e4bc4e0

saswatpp force-pushed the sun397-dataset branch from 6d3db78 to e4bc4e0 Compare January 6, 2022 17:30

NicolasHug reviewed Jan 6, 2022

View reviewed changes

torchvision/datasets/sun397.py Outdated Show resolved Hide resolved

Update torchvision/datasets/sun397.py

90cee2e

NicolasHug approved these changes Jan 6, 2022

View reviewed changes

Merge branch 'main' into sun397-dataset

69ee35e

pmeier merged commit 8c546f6 into pytorch:main Jan 7, 2022

pmeier added module: datasets new feature labels Jan 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SUN397 Dataset #5132

Add SUN397 Dataset #5132

saswatpp commented Dec 27, 2021 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Dec 27, 2021 •

edited

Loading

pmeier left a comment •

edited

Loading

saswatpp commented Dec 30, 2021

pmeier commented Dec 30, 2021

saswatpp commented Jan 1, 2022

pmeier left a comment

pmeier left a comment

NicolasHug left a comment

pmeier commented Jan 6, 2022

NicolasHug commented Jan 6, 2022

saswatpp commented Jan 6, 2022

NicolasHug commented Jan 6, 2022

saswatpp commented Jan 6, 2022

NicolasHug left a comment

pmeier commented Jan 7, 2022

github-actions bot commented Jan 7, 2022

Add SUN397 Dataset #5132

Add SUN397 Dataset #5132

Conversation

saswatpp commented Dec 27, 2021 • edited by pytorch-probot bot Loading

facebook-github-bot commented Dec 27, 2021 • edited Loading

💊 CI failures summary and remediations

pmeier left a comment • edited Loading

Choose a reason for hiding this comment

saswatpp commented Dec 30, 2021

pmeier commented Dec 30, 2021

saswatpp commented Jan 1, 2022

pmeier left a comment

Choose a reason for hiding this comment

pmeier left a comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

pmeier commented Jan 6, 2022

NicolasHug commented Jan 6, 2022

saswatpp commented Jan 6, 2022

NicolasHug commented Jan 6, 2022

saswatpp commented Jan 6, 2022

NicolasHug left a comment

Choose a reason for hiding this comment

pmeier commented Jan 7, 2022

github-actions bot commented Jan 7, 2022

saswatpp commented Dec 27, 2021 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Dec 27, 2021 •

edited

Loading

pmeier left a comment •

edited

Loading