Skip to content

Questions about prototype builtin datasets using torchdata #7609

Closed
@ain-soph

Description

@ain-soph

Hi all, I'm currently exploring builtin datasets with new standards:
https://github.com/pytorch/vision/blob/main/torchvision/prototype/datasets

Let's take Cifar10 as an example. I have several questions:

  1. Why are all datasets constructed as iter rather than map style? When I have an index (e.g., 2331), I can no longer use dataset[2331] like the old CIFAR10.
    In this case, how to get_item for the new format dataset? Do I have to use IterToMapConverter? That'll be quite strange because raw data format is map, I make it iter and traverse to change back to map.
  2. What does hint_shuffling do?
    def hint_shuffling(datapipe: IterDataPipe[D]) -> Shuffler[D]:
        return Shuffler(datapipe, buffer_size=INFINITE_BUFFER_SIZE).set_shuffle(False)
    It's used in all prototype datasets. It seems to wrap datapipe with a shuffler but set_shuffle(False). That seems doing nothing?
  3. When to use Decompressor and set resource.preprocess='decompress' or 'extract'?
    What's the difference among Decompressor, resource.preprocess='decompress', resource.preprocess='extract' and using nothing?
    • Cifar10 resource is a cifar-10-python.tar.gz and sets nothing. It will default call _guess_archive_loader in OnlineResource.load to generate a TarArchiveLoader
    • MNIST resource is a train-images-idx3-ubyte.gz and uses a Decompressor
    • cub200 resource is a CUB_200_2011.tgz uses decompress=True
  4. How to use Transform in the new dataset API? such as AutoAugment or RandomCrop? Especially about ToTensor or transforms.PILToTensor(), transforms.ConvertImageDtype(torch.float) (since prototype dataset returns uint8 Tensor). From the Transform V2 Tutorial Page, I may assume that transform is no longer embedded in Dataset because it doesn't accept transform or target_transform args? Then how can I fetch augmented data from the DataLoader?
  5. For dataset that each image is stored in encoded image format (the old ImageFolder type. e.g., ImageNet, GTSRB),the output image format is EncodedImage -> EncodedData -> Datapoint. For dataset stored in binary (e.g., MNIST and CIFAR), the output image format is Image -> Datapoint. Why are they different? I see most transform V2 APIs are conducted on Image. Why is EncodedImage used here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions