Add GTSRB dataset to prototypes #5214

NicolasHug · 2022-01-19T11:38:19Z

This PR adds the prototype version of the GTSRB dataset

cc @pmeier @bjuncek

facebook-github-bot · 2022-01-19T11:38:26Z

💊 CI failures summary and remediations

As of commit 8283332 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

unittest_linux_cpu_py3.7 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/root/project/torchvision/io/video.py:406: Runt...log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found

test/test_image.py::test_decode_png[L-ImageReadMode.GRAY-palette_pytorch.png]
test/test_image.py::test_decode_png[RGB-ImageReadMode.RGB-palette_pytorch.png]
  /root/project/env/lib/python3.7/site-packages/PIL/Image.py:946: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
    "Palette images with Transparency expressed in bytes should be "

test/test_io.py::TestVideo::test_probe_video_from_memory
  /root/project/torchvision/io/_video_opt.py:423: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1643011704494/work/torch/csrc/utils/tensor_new.cpp:998.)
    video_data = torch.frombuffer(video_data, dtype=torch.uint8)

test/test_io.py::TestVideo::test_read_video_timestamps_corrupted_file
  /root/project/torchvision/io/video.py:406: RuntimeWarning: Failed to open container for /tmp/tmpm6bt05c7.mp4; Caught error: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmpm6bt05c7.mp4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
    warnings.warn(msg, RuntimeWarning)

test/test_models.py::test_memory_efficient_densenet[densenet121]
test/test_models.py::test_memory_efficient_densenet[densenet169]
test/test_models.py::test_memory_efficient_densenet[densenet201]
test/test_models.py::test_memory_efficient_densenet[densenet161]
  /root/project/env/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
    warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

test/test_models.py::test_inception_v3_eval

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

pmeier

Thanks a lot @NicolasHug! I've only reviewed the prototype part, let me know if I also need to look at the other changes.

torchvision/prototype/datasets/_builtin/gtsrb.py

torchvision/prototype/datasets/_builtin/gtsrb.categories

torchvision/prototype/datasets/_builtin/gtsrb.py

test/builtin_dataset_mocks.py

NicolasHug · 2022-01-19T18:10:11Z

test/datasets_utils.py

@@ -877,7 +877,7 @@ def _make_archive(root, name, *files_or_dirs, opener, adder, remove=True):
    files, dirs = _split_files_or_dirs(root, *files_or_dirs)

    with opener(archive) as fh:
-        for file in files:
+        for file in sorted(files):


@pmeier LMK what you think of this.

Down below I set buffer_size=1 in the Zipper, because in the original .zip files, both the image_dp and the gt_dp are fully aligned: they both contain images 00001, 00002, etc. in this order. So I'm assuming that buffer_size=1 is better than buffer_size=UNLIMITED?

Without this call to sorted(), the tests would fail: the .zip archive created by make_zip would contain the files in a shuffled order (because files is a set), and so image_dp and the gt_dp would not be aligned anymore, leading to a failure to match keys in the Zipper. (Note: this is only a problem in the tests; the code works fine otherwise on my custom script iterating over the dataset).

I hope this won't make other tests fails. This might not be a problem that we have right now, but perhaps something to keep in mind for the future: we might need the test archives to exactly match the order of the "original" archives.

Good point, I didn't think of that. Do you know if the order of the returned paths of pathlib.Path.glob() is stable? If yes, we could simply replace the sets in _split_files_or_dirs with lists instead of sorting here.

I'm not sure about Path.glob(). I know that glob.glob has no guaranteed order, but I don't think Path.glob() relies on it. Maybe the safest is to not assume a specific order.

BTW, slightly related, what was the reason to use sets instead of lists?

what was the reason to use sets instead of lists?

It think the reason was to avoid duplicates, but I don't remember if there was a case where I hit something like that.

Have you tried using lists rather than sorting afterwards? If CI is not complaining for other datasets, I feel like that would be the better approach.

I just saw this call to remove() which might the reason for using sets:

vision/test/datasets_utils.py

Lines 862 to 863 in afda28a

if root in dirs:

dirs.remove(root)

I can still switch to lists if you'd like, I guess I would have to write something like

dirs = [dir in dirs if dir != root]

LMK which one you prefer

torchvision/prototype/datasets/_builtin/gtsrb.py

pmeier

Only minor stuff inline. Thanks a lot @NicolasHug!

Besides that one other question: Given that the dataset also provides a bounding box for each image, shouldn't we also return it? For the test split we already load the data. For the training data, each image folder contains a GT-{label:05d}.csv file in the same format as the testing annotations. Basically instead of Filtering the images, you could use a Demultiplexer to get a images and a annotations datapipe.

pmeier · 2022-01-21T08:03:32Z

test/datasets_utils.py

@@ -877,7 +877,7 @@ def _make_archive(root, name, *files_or_dirs, opener, adder, remove=True):
    files, dirs = _split_files_or_dirs(root, *files_or_dirs)

    with opener(archive) as fh:
-        for file in files:
+        for file in sorted(files):


Have you tried using lists rather than sorting afterwards? If CI is not complaining for other datasets, I feel like that would be the better approach.

torchvision/prototype/datasets/_builtin/README.md

torchvision/prototype/datasets/_builtin/gtsrb.py

Co-authored-by: Philip Meier <[email protected]>

torchvision/prototype/datasets/_builtin/gtsrb.py

Co-authored-by: Philip Meier <[email protected]>

test/test_prototype_builtin_datasets.py

Summary: Co-authored-by: Philip Meier <[email protected]> Reviewed By: jdsgomes, prabhat00155 Differential Revision: D33739393 fbshipit-source-id: f65df4355c53a2fed2534b4bbd3ce7c1aa0606e2

NicolasHug added 16 commits January 5, 2022 15:02

Change default of download for Food101 and DTD

237a707

WIP

bc3be4e

Merge branch 'main' of github.com:pytorch/vision into defaultdownload

85ca229

Set download default to False and put it at the end

87695d4

Keep stuff private

1e6e37d

GTSRB: train -> split. Also use pathlib

474546f

mypy

a38a18b

Remove split and partition for SUN397

d58ef16

mypy

5061141

mypy

6c02cff

Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype

d3cb34f

Merge branch 'defaultdownload' into gtsrb_prototype

d288c6c

WIP

521b75c

WIP

1c1ceb0

Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype

1b2ee27

WIP

4fdb976

pytorch-probot bot added the ciflow/default label Jan 19, 2022

facebook-github-bot added the cla signed label Jan 19, 2022

NicolasHug added module: datasets prototype and removed cla signed labels Jan 19, 2022

facebook-github-bot added the cla signed label Jan 19, 2022

NicolasHug added 4 commits January 19, 2022 13:05

Add tests

a6ae4c4

Add some types

761e5d7

lmao mypy you funny lad

1dd6efe

fix unpacking

a32ab88

pmeier reviewed Jan 19, 2022

View reviewed changes

NicolasHug added 2 commits January 19, 2022 15:13

Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype

862187a

Use DictWriter

e487828

NicolasHug added 5 commits January 19, 2022 15:28

Split URL root

9ac22d3

Use name instead of stem

1f1fa35

Add category to labels, and fix dict reading

f25a83a

Use path_comparator

52ec648

Use buffer_size=1

379876f

NicolasHug commented Jan 19, 2022

View reviewed changes

torchvision/prototype/datasets/_builtin/gtsrb.py Outdated Show resolved Hide resolved

NicolasHug added 4 commits January 20, 2022 12:24

Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype

632c212

Merge branch 'main' of github.com:pytorch/vision into gtsrb_prototype

0d6b58d

Use Zipper instead of IterKeyZipper

e26b456

mypy

b958b6b

pmeier self-requested a review January 20, 2022 15:42

NicolasHug added 2 commits January 20, 2022 17:01

Some more instructions

06c0904

forgot backquotes

18b87e2

pmeier approved these changes Jan 21, 2022

View reviewed changes

NicolasHug and others added 4 commits January 21, 2022 11:15

Apply suggestions from code review

44bb8f1

Co-authored-by: Philip Meier <[email protected]>

gt -> ground_truth

c1ec16d

e -> sample

ff78c70

Add support for bboxes

cd38e25

pmeier reviewed Jan 21, 2022

View reviewed changes

torchvision/prototype/datasets/_builtin/gtsrb.py Outdated Show resolved Hide resolved

NicolasHug and others added 4 commits January 21, 2022 15:44

Update torchvision/prototype/datasets/_builtin/gtsrb.py

1e8aea6

Co-authored-by: Philip Meier <[email protected]>

format

8e9a617

Remove unused method

6703710

Add test for label matching

6b67ce7

NicolasHug commented Jan 21, 2022

View reviewed changes

test/test_prototype_builtin_datasets.py Outdated Show resolved Hide resolved

NicolasHug added 2 commits January 24, 2022 09:48

Update test/test_prototype_builtin_datasets.py

1ef84e0

Merge branch 'main' into gtsrb_prototype

8283332

NicolasHug merged commit 508c79d into pytorch:main Jan 24, 2022

pmeier mentioned this pull request Mar 23, 2022

add instructions how to add tests for prototype datasets #5666

Merged

Add GTSRB dataset to prototypes #5214

Add GTSRB dataset to prototypes #5214

Uh oh!

Conversation

NicolasHug commented Jan 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

unittest_linux_cpu_py3.7 (1/1)

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug Jan 19, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Jan 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Jan 19, 2022 •

edited

Loading

facebook-github-bot commented Jan 19, 2022 •

edited

Loading

pmeier Jan 19, 2022 •

edited

Loading