remove decoding from prototype datasets #5287

pmeier · 2022-01-26T15:58:11Z

Same deal as #5283. Supersedes #5105.

Main change is to use the new encoded data feature types in the prototype datasets. This completely removes the need to pass a decoder to datasets.load. The decoding will be performed by a transform that will be added later on.
Doing that the decoding part would need to be dropped from the canonical _collate_and_decode_sample method. Since "collate" is also used to prepare a batch of data for model consumption, I've opted to change the name to _prepare_sample.
This also fixes the ImageNet validation split. We use a different order of the categories than the dataset for BC reasons. This was not considered in the current implementation. cc @datumbox
Add categories for the VOC dataset

facebook-github-bot · 2022-01-26T15:58:18Z

💊 CI failures summary and remediations

As of commit 36957ff (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug

Stamping! Happy to discuss and review in more details down the road

pmeier · 2022-01-26T15:59:13Z

test/builtin_dataset_mocks.py

-    wnids = tuple(info.extra.wnid_to_category.keys())
-    if config.split == "train":
-        images_root = root / "ILSVRC2012_img_train"
+    from scipy.io import savemat


While working on the fix for the validation split, I realized that the data setup was slightly wrong.

pmeier · 2022-01-26T16:00:12Z

test/builtin_dataset_mocks.py

-                bndbox = {"xmin": "1", "xmax": "2", "ymin": "3", "ymax": "4"}
+        def add_size(obj):
+            obj = add_child(obj, "size")
+            size = {"width": 0, "height": 0, "depth": 3}


VOC provides the image size together with the annotations. Since the reworked BoundingBox requires the image size, we need to add it to the mock data.

pmeier · 2022-01-26T16:01:37Z

test/test_prototype_transforms.py

@@ -1,61 +0,0 @@
-import pytest


Same deal as before. The tests are only partially compatible with the new features in the datasets. Thus, we remove them here and can re-add them when the transforms API is more stable.

pmeier · 2022-01-26T16:04:44Z

torchvision/prototype/datasets/_builtin/coco.py

    ) -> Dict[str, Any]:
        ann_data, image_data = data
        anns, image_meta = ann_data

-        sample = self._collate_and_decode_image(image_data, decoder=decoder)
-        if annotations:


We will never get to this point if annotations is None

pmeier · 2022-01-26T16:09:53Z

torchvision/prototype/datasets/_builtin/sbd.py

            dependencies=("scipy",),
            homepage="http://home.bharathh.info/pubs/codes/SBD/download.html",
            valid_options=dict(
                split=("train", "val", "train_noval"),
-                boundaries=(True, False),


If we already load the mat file that stores the annotations, there is no need to only return only part of the data.

pmeier · 2022-01-26T16:10:32Z

torchvision/prototype/datasets/_builtin/sbd.py

        if config.split == "train_noval":
            split_dp = extra_split_dp
-        split_dp = Filter(split_dp, path_comparator("stem", config.split))
+
+        split_dp = Filter(split_dp, path_comparator("name", f"{config.split}.txt"))


Using the name is probably more readable than using the stem.

pmeier · 2022-01-26T16:12:53Z

torchvision/prototype/datasets/utils/_internal.py

@@ -70,14 +66,6 @@ def read_mat(buffer: io.IOBase, **kwargs: Any) -> Any:
    return sio.loadmat(buffer, **kwargs)


-def image_buffer_from_array(array: np.ndarray, *, format: str = "png") -> io.BytesIO:


This is no longer needed as it always was a crutch to enable custom decoders for datasets that didn't contain encoded images in the first place.

* revamp prototype features (#5283) * remove decoding from prototype datasets (#5287) * remove decoder from prototype datasets * remove unused imports * cleanup * fix readme * use OneHotLabel in SEMEION * improve voc implementation * revert unrelated changes * fix semeion mock data * fix pcam * readd functional transforms API to prototype (#5295) * readd functional transforms * cleanup * add missing imports * remove __torch_function__ dispatch * readd repr * readd empty line * add test for scriptability * remove function copy * change import from functional tensor transforms to just functional * fix import * fix test * fix prototype features and functional transforms after review (#5377) * fix prototype functional transforms after review * address features review * make mypy more strict on prototype features * make mypy more strict for prototype transforms * fix annotation * fix kernel tests * add automatic feature type dispatch to functional transforms (#5323) * add auto dispatch * fix missing arguments error message * remove pil kernel for erase * automate feature specific parameter detection * fix typos * cleanup dispatcher call * remove __torch_function__ from transform dispatch * remove auto-generation * revert unrelated changes * remove implements decorator * change register parameter order * change order of transforms for readability * add documentation for __torch_function__ * fix mypy * inline check for support * refactor kernel registering process * refactor dispatch to be a regular decorator * split kernels and dispatchers * remove sentinels * replace pass with ... * appease mypy * make single kernel dispatchers more concise * make dispatcher signatures more generic * make kernel checking more strict * revert doc changes * address Franciscos comments * remove inplace * rename kernel test module * fix inplace * remove special casing for pil and vanilla tensors * address comments * update docs * cleanup features / transforms feature branch (#5406) * mark candidates for removal * align signature of resize_bounding_box with corresponding image kernel * fix documentation of Feature * remove interpolation mode and antialias option from resize_segmentation_mask * remove or privatize functionality in features / datasets / transforms

Summary: * revamp prototype features (#5283) * remove decoding from prototype datasets (#5287) * remove decoder from prototype datasets * remove unused imports * cleanup * fix readme * use OneHotLabel in SEMEION * improve voc implementation * revert unrelated changes * fix semeion mock data * fix pcam * readd functional transforms API to prototype (#5295) * readd functional transforms * cleanup * add missing imports * remove __torch_function__ dispatch * readd repr * readd empty line * add test for scriptability * remove function copy * change import from functional tensor transforms to just functional * fix import * fix test * fix prototype features and functional transforms after review (#5377) * fix prototype functional transforms after review * address features review * make mypy more strict on prototype features * make mypy more strict for prototype transforms * fix annotation * fix kernel tests * add automatic feature type dispatch to functional transforms (#5323) * add auto dispatch * fix missing arguments error message * remove pil kernel for erase * automate feature specific parameter detection * fix typos * cleanup dispatcher call * remove __torch_function__ from transform dispatch * remove auto-generation * revert unrelated changes * remove implements decorator * change register parameter order * change order of transforms for readability * add documentation for __torch_function__ * fix mypy * inline check for support * refactor kernel registering process * refactor dispatch to be a regular decorator * split kernels and dispatchers * remove sentinels * replace pass with ... * appease mypy * make single kernel dispatchers more concise * make dispatcher signatures more generic * make kernel checking more strict * revert doc changes * address Franciscos comments * remove inplace * rename kernel test module * fix inplace * remove special casing for pil and vanilla tensors * address comments * update docs * cleanup features / transforms feature branch (#5406) * mark candidates for removal * align signature of resize_bounding_box with corresponding image kernel * fix documentation of Feature * remove interpolation mode and antialias option from resize_segmentation_mask * remove or privatize functionality in features / datasets / transforms Reviewed By: sallysyw Differential Revision: D34265747 fbshipit-source-id: 569ed9f74ac0c026391767c3b422ca0147f55ead

pmeier added 3 commits January 26, 2022 16:00

remove decoder from prototype datasets

1ca3a0f

remove unused imports

e21248c

cleanup

dab2774

pmeier added module: datasets prototype labels Jan 26, 2022

pytorch-bot bot added the ciflow/default label Jan 26, 2022

facebook-github-bot added the cla signed label Jan 26, 2022

NicolasHug approved these changes Jan 26, 2022

View reviewed changes

pmeier added 4 commits January 26, 2022 17:15

fix readme

3da21c6

use OneHotLabel in SEMEION

381f462

improve voc implementation

322d40c

revert unrelated changes

aaff6f6

pmeier commented Jan 26, 2022

View reviewed changes

pmeier changed the title ~~Datasets/remove decoder~~ remove decoding from prototype datasets Jan 26, 2022

pmeier mentioned this pull request Jan 26, 2022

[PoC] separate decoding from datasets #5105

Closed

fix semeion mock data

36957ff

pmeier merged commit be67431 into pytorch:revamp-prototype-features-transforms Jan 27, 2022

pmeier deleted the datasets/remove-decoder branch January 27, 2022 07:53

pmeier mentioned this pull request Feb 23, 2022

refactor prototype datasets to inherit from IterDataPipe #5448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove decoding from prototype datasets #5287

remove decoding from prototype datasets #5287

pmeier commented Jan 26, 2022

facebook-github-bot commented Jan 26, 2022 •

edited

Loading

NicolasHug left a comment

pmeier Jan 26, 2022

pmeier Jan 26, 2022

pmeier Jan 26, 2022

pmeier Jan 26, 2022

pmeier Jan 26, 2022

pmeier Jan 26, 2022

pmeier Jan 26, 2022

		@@ -70,14 +66,6 @@ def read_mat(buffer: io.IOBase, **kwargs: Any) -> Any:
		return sio.loadmat(buffer, **kwargs)


		def image_buffer_from_array(array: np.ndarray, *, format: str = "png") -> io.BytesIO:

remove decoding from prototype datasets #5287

remove decoding from prototype datasets #5287

Conversation

pmeier commented Jan 26, 2022

facebook-github-bot commented Jan 26, 2022 • edited Loading

💊 CI failures summary and remediations

NicolasHug left a comment

Choose a reason for hiding this comment

pmeier Jan 26, 2022

Choose a reason for hiding this comment

pmeier Jan 26, 2022

Choose a reason for hiding this comment

pmeier Jan 26, 2022

Choose a reason for hiding this comment

pmeier Jan 26, 2022

Choose a reason for hiding this comment

pmeier Jan 26, 2022

Choose a reason for hiding this comment

pmeier Jan 26, 2022

Choose a reason for hiding this comment

pmeier Jan 26, 2022

Choose a reason for hiding this comment

facebook-github-bot commented Jan 26, 2022 •

edited

Loading