add prototype dataset for CelebA #4514

pmeier · 2021-09-30T15:08:46Z

cc @pmeier @mthrok @bjuncek

pmeier · 2021-10-05T07:04:39Z

mypy will be happy after either #4513 or #4531 is merged.

torchvision/prototype/datasets/_builtin/celeba.py

ejguan

LGTM. Just want to discuss if we can eliminate CelebACSVParser and use CSVDictParser

ejguan · 2021-10-05T16:50:30Z

torchvision/prototype/datasets/_builtin/celeba.py

+                # Empty field names are filtered out, because some files have an extr white space after the header
+                # line, which is recognized as extra column
+                fieldnames = [name for name in next(csv.reader([next(file)], **self._fmtparams)) if name]
+                # Some files do not include a label for the image ID column
+                if fieldnames[0] != "image_id":
+                    fieldnames.insert(0, "image_id")


This is super annoying that we can't use DictReader.
Since we have three datapipes with header, could we hard-code the fieldnames for each one and use CSVDictReader(dp, skip_lines=2, fieldnames=[...])

We could, but there is more to it. Note that we also need to map the output so that it is a tuple with the image id first and the remaining row second. At this point we would probably have a more elaborate implementation bending everything to use the default building blocks than to just write our own.

I agree it is annoying, but writing and understanding this custom parser is not hard so I feel its warranted.

ejguan · 2021-10-05T16:54:32Z

torchvision/prototype/datasets/_builtin/celeba.py

+        for partial_anns_dp in partial_anns_dps:
+            anns_dp = KeyZipper(anns_dp, partial_anns_dp, lambda data: data[0], buffer_size=INFINITE_BUFFER_SIZE)
+            anns_dp = Mapper(anns_dp, self._collate_partial_anns)


I will add an issue to data repo about the KeyZipper. This is a super anti-pattern design for stacked KeyZipper.

Let me see if the files are perfectly aligned. If yes, we could use a Zipper instead, which is able to handle more than 2 datapipes at once.

They are perfectly aligned 🎉 The latest commits only need two KeyZippers to zip the splits, images, and annotations. We can't get around them at the moment.

ejguan · 2021-10-06T13:51:08Z

torchvision/prototype/datasets/_builtin/celeba.py

+
+    def __iter__(self):
+        for _, file in self.datapipe:
+            file = (line.decode() for line in file)


It's fine as the whole file can fit into memory. But, to optimize it a little bit, could we change it to a streaming style by adding a decode method to yield each decoded line?

Yeah, that would be a lot better. I'll send a patch.

After revisiting this, I think it is already doing what you proposed. Writing

file = (line.decode() for line in file)

is functionally equivalent to

def decode(file): for line in file: yield line.decode() file = decode(file)

You are right!

Summary: * add prototype dataset for CelebA * fix code format * fix mypy * hardcode fmtparams * fix mypy * replace KeyZipper with Zipper for annotations Reviewed By: NicolasHug Differential Revision: D31505573 fbshipit-source-id: bc2a66dffd410d51bc5b240bd32b344000248f00

* add prototype dataset for CelebA * fix code format * fix mypy * hardcode fmtparams * fix mypy * replace KeyZipper with Zipper for annotations

add prototype dataset for CelebA

5c1f608

pmeier added module: datasets prototype labels Sep 30, 2021

facebook-github-bot added the cla signed label Sep 30, 2021

pmeier added 2 commits October 5, 2021 08:35

Merge branch 'main' into datasets/celeba

3ca94a9

fix code format

c5728bb

pytorch-probot bot added the ciflow/default label Oct 5, 2021

fix mypy

723085a

pmeier requested a review from fmassa October 5, 2021 07:04

NicolasHug reviewed Oct 5, 2021

View reviewed changes

torchvision/prototype/datasets/_builtin/celeba.py Outdated Show resolved Hide resolved

pmeier added 2 commits October 5, 2021 16:24

Merge branch 'main' into datasets/celeba

2ef9be3

hardcode fmtparams

a2e80ac

pmeier requested a review from ejguan October 5, 2021 14:49

Merge branch 'main' into datasets/celeba

e88b7cf

ejguan approved these changes Oct 5, 2021

View reviewed changes

pmeier added 4 commits October 6, 2021 09:18

fix mypy

a1c86b7

Merge branch 'main' into datasets/celeba

ccf8d09

replace KeyZipper with Zipper for annotations

11c29bf

Merge branch 'main' into datasets/celeba

36714ca

pmeier merged commit 8cc6d52 into pytorch:main Oct 6, 2021

pmeier deleted the datasets/celeba branch October 6, 2021 08:10

ejguan reviewed Oct 6, 2021

View reviewed changes

mszhanyi pushed a commit to mszhanyi/vision that referenced this pull request Oct 19, 2021

add prototype dataset for CelebA (pytorch#4514)

357970d

* add prototype dataset for CelebA * fix code format * fix mypy * hardcode fmtparams * fix mypy * replace KeyZipper with Zipper for annotations

cyyever pushed a commit to cyyever/vision that referenced this pull request Nov 16, 2021

add prototype dataset for CelebA (pytorch#4514)

9a7e3e1

* add prototype dataset for CelebA * fix code format * fix mypy * hardcode fmtparams * fix mypy * replace KeyZipper with Zipper for annotations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add prototype dataset for CelebA #4514

add prototype dataset for CelebA #4514

Uh oh!

pmeier commented Sep 30, 2021 •

edited by pytorch-probot bot

Loading

Uh oh!

pmeier commented Oct 5, 2021

Uh oh!

Uh oh!

ejguan left a comment

Uh oh!

ejguan Oct 5, 2021

Uh oh!

pmeier Oct 6, 2021

Uh oh!

ejguan Oct 5, 2021

Uh oh!

pmeier Oct 6, 2021

Uh oh!

pmeier Oct 6, 2021

Uh oh!

ejguan Oct 6, 2021

Uh oh!

pmeier Oct 6, 2021

Uh oh!

pmeier Oct 7, 2021

Uh oh!

ejguan Oct 7, 2021

Uh oh!

Uh oh!

add prototype dataset for CelebA #4514

add prototype dataset for CelebA #4514

Uh oh!

Conversation

pmeier commented Sep 30, 2021 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Oct 5, 2021

Uh oh!

Uh oh!

ejguan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier commented Sep 30, 2021 •

edited by pytorch-probot bot

Loading