Skip to content

Add GTSRB dataset to prototypes #5214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jan 24, 2022
Merged

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Jan 19, 2022

This PR adds the prototype version of the GTSRB dataset

cc @pmeier @bjuncek

@facebook-github-bot
Copy link

facebook-github-bot commented Jan 19, 2022

💊 CI failures summary and remediations

As of commit 8283332 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build unittest_linux_cpu_py3.7 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/root/project/torchvision/io/video.py:406: Runt...log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
test/test_image.py::test_decode_png[L-ImageReadMode.GRAY-palette_pytorch.png]
test/test_image.py::test_decode_png[RGB-ImageReadMode.RGB-palette_pytorch.png]
  /root/project/env/lib/python3.7/site-packages/PIL/Image.py:946: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
    "Palette images with Transparency expressed in bytes should be "

test/test_io.py::TestVideo::test_probe_video_from_memory
  /root/project/torchvision/io/_video_opt.py:423: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1643011704494/work/torch/csrc/utils/tensor_new.cpp:998.)
    video_data = torch.frombuffer(video_data, dtype=torch.uint8)

test/test_io.py::TestVideo::test_read_video_timestamps_corrupted_file
  /root/project/torchvision/io/video.py:406: RuntimeWarning: Failed to open container for /tmp/tmpm6bt05c7.mp4; Caught error: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmpm6bt05c7.mp4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
    warnings.warn(msg, RuntimeWarning)

test/test_models.py::test_memory_efficient_densenet[densenet121]
test/test_models.py::test_memory_efficient_densenet[densenet169]
test/test_models.py::test_memory_efficient_densenet[densenet201]
test/test_models.py::test_memory_efficient_densenet[densenet161]
  /root/project/env/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
    warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

test/test_models.py::test_inception_v3_eval

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @NicolasHug! I've only reviewed the prototype part, let me know if I also need to look at the other changes.

@@ -877,7 +877,7 @@ def _make_archive(root, name, *files_or_dirs, opener, adder, remove=True):
files, dirs = _split_files_or_dirs(root, *files_or_dirs)

with opener(archive) as fh:
for file in files:
for file in sorted(files):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier LMK what you think of this.

Down below I set buffer_size=1 in the Zipper, because in the original .zip files, both the image_dp and the gt_dp are fully aligned: they both contain images 00001, 00002, etc. in this order. So I'm assuming that buffer_size=1 is better than buffer_size=UNLIMITED?

Without this call to sorted(), the tests would fail: the .zip archive created by make_zip would contain the files in a shuffled order (because files is a set), and so image_dp and the gt_dp would not be aligned anymore, leading to a failure to match keys in the Zipper. (Note: this is only a problem in the tests; the code works fine otherwise on my custom script iterating over the dataset).

I hope this won't make other tests fails. This might not be a problem that we have right now, but perhaps something to keep in mind for the future: we might need the test archives to exactly match the order of the "original" archives.

Copy link
Collaborator

@pmeier pmeier Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't think of that. Do you know if the order of the returned paths of pathlib.Path.glob() is stable? If yes, we could simply replace the sets in _split_files_or_dirs with lists instead of sorting here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about Path.glob(). I know that glob.glob has no guaranteed order, but I don't think Path.glob() relies on it. Maybe the safest is to not assume a specific order.

BTW, slightly related, what was the reason to use sets instead of lists?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the reason to use sets instead of lists?

It think the reason was to avoid duplicates, but I don't remember if there was a case where I hit something like that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using lists rather than sorting afterwards? If CI is not complaining for other datasets, I feel like that would be the better approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw this call to remove() which might the reason for using sets:

if root in dirs:
dirs.remove(root)

I can still switch to lists if you'd like, I guess I would have to write something like

dirs = [dir in dirs if dir != root]

LMK which one you prefer

@pmeier pmeier self-requested a review January 20, 2022 15:42
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor stuff inline. Thanks a lot @NicolasHug!

Besides that one other question: Given that the dataset also provides a bounding box for each image, shouldn't we also return it? For the test split we already load the data. For the training data, each image folder contains a GT-{label:05d}.csv file in the same format as the testing annotations. Basically instead of Filtering the images, you could use a Demultiplexer to get a images and a annotations datapipe.

@@ -877,7 +877,7 @@ def _make_archive(root, name, *files_or_dirs, opener, adder, remove=True):
files, dirs = _split_files_or_dirs(root, *files_or_dirs)

with opener(archive) as fh:
for file in files:
for file in sorted(files):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using lists rather than sorting afterwards? If CI is not complaining for other datasets, I feel like that would be the better approach.

@NicolasHug NicolasHug merged commit 508c79d into pytorch:main Jan 24, 2022
facebook-github-bot pushed a commit that referenced this pull request Jan 26, 2022
Summary: Co-authored-by: Philip Meier <[email protected]>

Reviewed By: jdsgomes, prabhat00155

Differential Revision: D33739393

fbshipit-source-id: f65df4355c53a2fed2534b4bbd3ce7c1aa0606e2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants