Cleanups for FLAVA datasets #5164

NicolasHug · 2022-01-05T15:04:03Z

Towards the end of #5108

All datasets but 2 have download=False as the default, so this PR sets the default to False as well for Food101 and DTD for consistency. It also documents the download parameter for Food101 which was missing from the Docstring.

See #5164 (comment) for complete set of changes

cc @pmeier

facebook-github-bot · 2022-01-05T15:04:10Z

💊 CI failures summary and remediations

As of commit 3c70d81 (more details on the Dr. CI page):

1/2 failures introduced in this PR
1/2 broken upstream at merge base 563d9ca since Jan 14

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

unittest_linux_cpu_py3.7 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/root/project/torchvision/io/video.py:406: Runt...log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found

test/test_image.py::test_decode_png[L-ImageReadMode.GRAY-palette_pytorch.png]
test/test_image.py::test_decode_png[RGB-ImageReadMode.RGB-palette_pytorch.png]
  /root/project/env/lib/python3.7/site-packages/PIL/Image.py:946: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
    "Palette images with Transparency expressed in bytes should be "

test/test_io.py::TestVideo::test_probe_video_from_memory
  /root/project/torchvision/io/_video_opt.py:423: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1642552271656/work/torch/csrc/utils/tensor_new.cpp:998.)
    video_data = torch.frombuffer(video_data, dtype=torch.uint8)

test/test_io.py::TestVideo::test_read_video_timestamps_corrupted_file
  /root/project/torchvision/io/video.py:406: RuntimeWarning: Failed to open container for /tmp/tmprbxm6fs4.mp4; Caught error: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmprbxm6fs4.mp4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
    warnings.warn(msg, RuntimeWarning)

test/test_models.py::test_memory_efficient_densenet[densenet121]
test/test_models.py::test_memory_efficient_densenet[densenet169]
test/test_models.py::test_memory_efficient_densenet[densenet201]
test/test_models.py::test_memory_efficient_densenet[densenet161]
  /root/project/env/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
    warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

test/test_models.py::test_inception_v3_eval

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet.

unittest_prototype since Jan 14 (adf8466)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug · 2022-01-05T15:30:52Z

Thanks for the review @pmeier . Following up on your #5130 (comment), let's use this PR to

make sure all datasets have download=False
make sure all datasets have download after the transforms parameter.
change train parameter into split for consistency across these datasets
~~change name of DTD parameter partition~~ We kept partition in DTD and instead removed it from SUN397, along with its split parameter, because the train/test splits are only defined depending on the partition. Because each partition only contains a subset of the data, we decided not to include it, at least for now. On top of that, since our goal is to support the FLAVA implem, in the original implem https://github.com/facebookresearch/vissl/blob/main/extra_scripts/datasets/create_sun397_data_files.py#L92 they rely on a custom-made split. Because this split is arbitrary and non-standard, this isn't something we can support directly in torchvision.

I'll mark it as draft and we can come back to this once the rest of the PRs are merged.

EDIT: "all datasets" == all datasets that haven't been released yet.

pmeier · 2022-01-06T07:10:55Z

@NicolasHug You are only talking about the "FLAVA" datasets here, right? Because for other datasets that would be BC breaking and I want to avoid that, since we probably don't have time for a deprecation cycle before the API is deprecated in general.

NicolasHug · 2022-01-06T10:21:19Z

Fully agreed @pmeier , sorry for not being clearer in my comment above

pmeier · 2022-01-17T16:45:03Z

vision/torchvision/datasets/pcam.py

Line 78 in 4946827

import h5py # type: ignore[import]

is an anti-pattern. Since h5py has no annotations, it is better to ignore it globally rather than locally like

vision/mypy.ini

Lines 117 to 119 in 4946827

    
           [mypy-torchdata.*] 
        
           ignore_missing_imports = True

pmeier

One minor comment inline. Plus let's also resolve #5220 (comment). Otherwise, LGTM if CI is green! Thanks @NicolasHug

torchvision/datasets/gtsrb.py

pmeier · 2022-01-20T10:46:13Z

torchvision/datasets/rendered_sst2.py

-        for p in (self._base_folder / self._split_to_folder[self._split]).glob("**/*.png"):
-            self._labels.append(self.class_to_idx[p.parent.name])
-            self._image_files.append(p)
+        self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions="png")


Suggested change

self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions="png")

self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions=(".png",))

I believe it ultimately comes down to endswith which accepts tuples but also just plain strings.
I think the type annotations are incorrect here.

Let me send a PR to fix that.

… not

NicolasHug · 2022-01-20T12:19:29Z

Failure is unrelated, I'll merge. Thanks for the review!

Summary: * Change default of download for Food101 and DTD * Set download default to False and put it at the end * Keep stuff private * GTSRB: train -> split. Also use pathlib * mypy * Remove split and partition for SUN397 * mypy * mypy * move download param for SST2 * Use make_dataset in SST2 * Use a base URL for GTSRB * Let's make this code more complictaed than it needs to be because why not Reviewed By: jdsgomes, prabhat00155 Differential Revision: D33739381 fbshipit-source-id: a2bcfcdc2296ffe62f8e75c8107ff1d0a87957f1

Change default of download for Food101 and DTD

237a707

NicolasHug added module: datasets other if you have no clue or if you will manually handle the PR in the release notes labels Jan 5, 2022

pytorch-probot bot added the ciflow/default label Jan 5, 2022

facebook-github-bot added the cla signed label Jan 5, 2022

pmeier approved these changes Jan 5, 2022

View reviewed changes

pmeier mentioned this pull request Jan 5, 2022

add CLEVR dataset #5130

Merged

NicolasHug marked this pull request as draft January 5, 2022 15:31

Merge branch 'main' of github.com:pytorch/vision into defaultdownload

85ca229

NicolasHug changed the title ~~Change default of download for Food101 and DTD~~ Cleanups for FLAVA datasets Jan 18, 2022

NicolasHug added 7 commits January 18, 2022 10:58

Set download default to False and put it at the end

87695d4

Keep stuff private

1e6e37d

GTSRB: train -> split. Also use pathlib

474546f

mypy

a38a18b

Remove split and partition for SUN397

d58ef16

mypy

5061141

mypy

6c02cff

NicolasHug marked this pull request as ready for review January 18, 2022 15:09

pmeier self-requested a review January 18, 2022 16:14

This was referenced Jan 19, 2022

Add GTSRB dataset to prototypes #5214

Merged

Add Rendered sst2 dataset #5220

Merged

NicolasHug added 2 commits January 20, 2022 10:22

Merge branch 'main' of github.com:pytorch/vision into defaultdownload

cca0fb7

move download param for SST2

194b55d

pmeier approved these changes Jan 20, 2022

View reviewed changes

torchvision/datasets/gtsrb.py Outdated Show resolved Hide resolved

Use make_dataset in SST2

78e52c5

Use a base URL for GTSRB

dc7c166

pmeier reviewed Jan 20, 2022

View reviewed changes

Let's make this code more complictaed than it needs to be because why…

3c70d81

… not

pmeier mentioned this pull request Jan 20, 2022

allow single extension as str in make_dataset #5229

Merged

NicolasHug merged commit e047623 into pytorch:main Jan 20, 2022

YosuaMichael mentioned this pull request Mar 23, 2022

Add sun397 prototype datapipe #5667

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cleanups for FLAVA datasets #5164

Cleanups for FLAVA datasets #5164

Uh oh!

NicolasHug commented Jan 5, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Jan 5, 2022 •

edited

Loading

Uh oh!

NicolasHug commented Jan 5, 2022 •

edited

Loading

Uh oh!

pmeier commented Jan 6, 2022

Uh oh!

NicolasHug commented Jan 6, 2022

Uh oh!

pmeier commented Jan 17, 2022

Uh oh!

pmeier left a comment

Uh oh!

Uh oh!

pmeier Jan 20, 2022

Uh oh!

NicolasHug Jan 20, 2022

Uh oh!

pmeier Jan 20, 2022

Uh oh!

NicolasHug commented Jan 20, 2022

Uh oh!

Uh oh!

	self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions="png")
	self._samples = make_dataset(str(self._base_folder / self._split_to_folder[self._split]), extensions=(".png",))

Cleanups for FLAVA datasets #5164

Cleanups for FLAVA datasets #5164

Uh oh!

Conversation

NicolasHug commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

unittest_linux_cpu_py3.7 (1/1)

🚧 1 ongoing upstream failure:

Uh oh!

NicolasHug commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Jan 6, 2022

Uh oh!

NicolasHug commented Jan 6, 2022

Uh oh!

pmeier commented Jan 17, 2022

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Jan 20, 2022

Uh oh!

Uh oh!

NicolasHug commented Jan 5, 2022 •

edited

Loading

facebook-github-bot commented Jan 5, 2022 •

edited

Loading

NicolasHug commented Jan 5, 2022 •

edited

Loading