-
Notifications
You must be signed in to change notification settings - Fork 7.1k
add CLEVR dataset #5130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CLEVR dataset #5130
Conversation
💊 CI failures summary and remediationsAs of commit 35f464c (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
🚧 5 ongoing upstream failures:These were probably caused by upstream breakages that are not fixed yet.
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pmeier , I made a few comments below but this looks good
def _getattr_closure(obj: Any, *, attrs: Sequence[str]) -> Any: | ||
for attr in attrs: | ||
obj = getattr(obj, attr) | ||
return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me understand why this change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enables chained getattr
calls. In turn, this allows me to use this filter function
images_dp = Filter(images_dp, path_comparator("parent.name", config.split))
rather than writing a custom function that extracts the name of the parent folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it up to you to decide, as the entire logic of path_comparator
and path_accessor
is honestly too complex for me to handle ATM.
But as rule of thumb, I tend to avoid changes to helper functions when they just only address one single use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the entire logic of
path_comparator
andpath_accessor
is honestly too complex for me to handle ATM.
The complexity stems from the fact that we cannot use lambdas or local function together with the datapipes, since they cannot be serialized safely. Quick explanation:
-
path_accessor
: When using datapipes you often have path-handle-tuples for files.path_accessor
takes such a tuple, turns the the first item into apathlib.Path
and accesses it according to the input. Sopath_accessor("parent.name")
is equivalent todef extract_folder_name(data): path = pathlib.Path(data[0]) return path.parent.name
This is useful when you want to merge two datapipes and need a function that generates merge keys.
-
path_comparator
: This is one layer on top ofpath_accessor
by providing an equality comparison for the extract path information. For exampleFilter(scenes_dp, path_comparator("name", f"CLEVR_{config.split}_scenes.json"))
will select all files where the file name matches the second argument. (This is what I used to refactor your suggestion about the regex).
But as rule of thumb, I tend to avoid changes to helper functions when they just only address one single use-case.
It is true, that this currently only addresses this, but I already needed to make the same changes to getitem
a while back. So I'm guessing if I don't do it now, I'll have to do it in the future anyway, but then I also have to remember that I need to go back to here and fix this.
Co-authored-by: Nicolas Hug <[email protected]>
…o datasets/clevr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @pmeier , LGTM!
Summary: * add prototype dataset * add old-style dataset * appease mypy * simplify prototype scenes * Update torchvision/datasets/clevr.py Reviewed By: sallysyw Differential Revision: D33479266 fbshipit-source-id: c9a130490a572546fdc6bc8b88ba21b8e9aebc30 Co-authored-by: Nicolas Hug <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>
Addresses #5108.
cc @pmeier @bjuncek