Skip to content

remove decoding from prototype datasets #5287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Jan 26, 2022

Same deal as #5283. Supersedes #5105.


  • Main change is to use the new encoded data feature types in the prototype datasets. This completely removes the need to pass a decoder to datasets.load. The decoding will be performed by a transform that will be added later on.
  • Doing that the decoding part would need to be dropped from the canonical _collate_and_decode_sample method. Since "collate" is also used to prepare a batch of data for model consumption, I've opted to change the name to _prepare_sample.
  • This also fixes the ImageNet validation split. We use a different order of the categories than the dataset for BC reasons. This was not considered in the current implementation. cc @datumbox
  • Add categories for the VOC dataset

@facebook-github-bot
Copy link

facebook-github-bot commented Jan 26, 2022

💊 CI failures summary and remediations

As of commit 36957ff (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping! Happy to discuss and review in more details down the road

wnids = tuple(info.extra.wnid_to_category.keys())
if config.split == "train":
images_root = root / "ILSVRC2012_img_train"
from scipy.io import savemat
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on the fix for the validation split, I realized that the data setup was slightly wrong.

bndbox = {"xmin": "1", "xmax": "2", "ymin": "3", "ymax": "4"}
def add_size(obj):
obj = add_child(obj, "size")
size = {"width": 0, "height": 0, "depth": 3}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VOC provides the image size together with the annotations. Since the reworked BoundingBox requires the image size, we need to add it to the mock data.

@@ -1,61 +0,0 @@
import pytest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal as before. The tests are only partially compatible with the new features in the datasets. Thus, we remove them here and can re-add them when the transforms API is more stable.

) -> Dict[str, Any]:
ann_data, image_data = data
anns, image_meta = ann_data

sample = self._collate_and_decode_image(image_data, decoder=decoder)
if annotations:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will never get to this point if annotations is None

dependencies=("scipy",),
homepage="http://home.bharathh.info/pubs/codes/SBD/download.html",
valid_options=dict(
split=("train", "val", "train_noval"),
boundaries=(True, False),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we already load the mat file that stores the annotations, there is no need to only return only part of the data.

if config.split == "train_noval":
split_dp = extra_split_dp
split_dp = Filter(split_dp, path_comparator("stem", config.split))

split_dp = Filter(split_dp, path_comparator("name", f"{config.split}.txt"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the name is probably more readable than using the stem.

@@ -70,14 +66,6 @@ def read_mat(buffer: io.IOBase, **kwargs: Any) -> Any:
return sio.loadmat(buffer, **kwargs)


def image_buffer_from_array(array: np.ndarray, *, format: str = "png") -> io.BytesIO:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed as it always was a crutch to enable custom decoders for datasets that didn't contain encoded images in the first place.

@pmeier pmeier changed the title Datasets/remove decoder remove decoding from prototype datasets Jan 26, 2022
@pmeier pmeier merged commit be67431 into pytorch:revamp-prototype-features-transforms Jan 27, 2022
@pmeier pmeier deleted the datasets/remove-decoder branch January 27, 2022 07:53
datumbox pushed a commit that referenced this pull request Feb 11, 2022
* revamp prototype features (#5283)

* remove decoding from prototype datasets (#5287)

* remove decoder from prototype datasets

* remove unused imports

* cleanup

* fix readme

* use OneHotLabel in SEMEION

* improve voc implementation

* revert unrelated changes

* fix semeion mock data

* fix pcam

* readd functional transforms API to prototype (#5295)

* readd functional transforms

* cleanup

* add missing imports

* remove __torch_function__ dispatch

* readd repr

* readd empty line

* add test for scriptability

* remove function copy

* change import from functional tensor transforms to just functional

* fix import

* fix test

* fix prototype features and functional transforms after review (#5377)

* fix prototype functional transforms after review

* address features review

* make mypy more strict on prototype features

* make mypy more strict for prototype transforms

* fix annotation

* fix kernel tests

* add automatic feature type dispatch to functional transforms (#5323)

* add auto dispatch

* fix missing arguments error message

* remove pil kernel for erase

* automate feature specific parameter detection

* fix typos

* cleanup dispatcher call

* remove __torch_function__ from transform dispatch

* remove auto-generation

* revert unrelated changes

* remove implements decorator

* change register parameter order

* change order of transforms for readability

* add documentation for __torch_function__

* fix mypy

* inline check for support

* refactor kernel registering process

* refactor dispatch to be a regular decorator

* split kernels and dispatchers

* remove sentinels

* replace pass with ...

* appease mypy

* make single kernel dispatchers more concise

* make dispatcher signatures more generic

* make kernel checking more strict

* revert doc changes

* address Franciscos comments

* remove inplace

* rename kernel test module

* fix inplace

* remove special casing for pil and vanilla tensors

* address comments

* update docs

* cleanup features / transforms feature branch (#5406)

* mark candidates for removal

* align signature of resize_bounding_box with corresponding image kernel

* fix documentation of Feature

* remove interpolation mode and antialias option from resize_segmentation_mask

* remove or privatize functionality in features / datasets / transforms
facebook-github-bot pushed a commit that referenced this pull request Feb 16, 2022
Summary:
* revamp prototype features (#5283)

* remove decoding from prototype datasets (#5287)

* remove decoder from prototype datasets

* remove unused imports

* cleanup

* fix readme

* use OneHotLabel in SEMEION

* improve voc implementation

* revert unrelated changes

* fix semeion mock data

* fix pcam

* readd functional transforms API to prototype (#5295)

* readd functional transforms

* cleanup

* add missing imports

* remove __torch_function__ dispatch

* readd repr

* readd empty line

* add test for scriptability

* remove function copy

* change import from functional tensor transforms to just functional

* fix import

* fix test

* fix prototype features and functional transforms after review (#5377)

* fix prototype functional transforms after review

* address features review

* make mypy more strict on prototype features

* make mypy more strict for prototype transforms

* fix annotation

* fix kernel tests

* add automatic feature type dispatch to functional transforms (#5323)

* add auto dispatch

* fix missing arguments error message

* remove pil kernel for erase

* automate feature specific parameter detection

* fix typos

* cleanup dispatcher call

* remove __torch_function__ from transform dispatch

* remove auto-generation

* revert unrelated changes

* remove implements decorator

* change register parameter order

* change order of transforms for readability

* add documentation for __torch_function__

* fix mypy

* inline check for support

* refactor kernel registering process

* refactor dispatch to be a regular decorator

* split kernels and dispatchers

* remove sentinels

* replace pass with ...

* appease mypy

* make single kernel dispatchers more concise

* make dispatcher signatures more generic

* make kernel checking more strict

* revert doc changes

* address Franciscos comments

* remove inplace

* rename kernel test module

* fix inplace

* remove special casing for pil and vanilla tensors

* address comments

* update docs

* cleanup features / transforms feature branch (#5406)

* mark candidates for removal

* align signature of resize_bounding_box with corresponding image kernel

* fix documentation of Feature

* remove interpolation mode and antialias option from resize_segmentation_mask

* remove or privatize functionality in features / datasets / transforms

Reviewed By: sallysyw

Differential Revision: D34265747

fbshipit-source-id: 569ed9f74ac0c026391767c3b422ca0147f55ead
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants