-
Notifications
You must be signed in to change notification settings - Fork 166
Added default no-op function for flatmap #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @mobikoz! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this PR. Just some nit comments.
Waiting for company CLA signature, should be shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing - can you add a unit test around here?
https://github.com/pytorch/data/blob/main/test/test_iterdatapipe.py#L692
Thanks!
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Unit test added, also I fixed _no_op_fn to always return iterables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment about _no_op_fn
, let me know what you think.
torchdata/datapipes/utils/common.py
Outdated
if len(args) == 1 and isinstance(args[0], (list, tuple)): | ||
return args[0] | ||
else: | ||
return args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(args) == 1 and isinstance(args[0], (list, tuple)): | |
return args[0] | |
else: | |
return args | |
if len(args) == 1: | |
try: | |
return iter(args[0]) | |
except TypeError: | |
pass | |
return args |
@mobikoz What do you think of this instead? This will allow objects that are not list
or tuple
to be unpacked (e.g. numpy.array
).
cc: @ejguan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I know why do we want to differentiate the cases between size 1 and other sizes?
list(dp) # [[1], [2], [3], [4],]
list(dp.flatmap()) # I would expect the result as [1, 2, 3, 4]
BTW, if this function is not commonly used by other DataPipes, we can put it in the callable.py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I know why do we want to differentiate the cases between size 1 and other sizes?
As far as I observe, because the input to the function is *args
, there are two main cases where size == 1
:
input_col=None
input_col
only specifies one column
Moving it from common.py
to callable.py
makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejguan This differentiation is the consequence of using variable length *args for the sake of input_col parameter use.
If not handled, in case of given example:
list(dp) # [[1], [2], [3], [4],]
list(dp.flatmap()) # result will be [[1], [2], [3], [4],]
This is because args would be always of type tuple, in this case for eg. ([1],) for the first element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense. Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it will work like that:
list(dp) # [[1, 2, 3], [4, 5, 6]]
list(dp.flatmap(input_col=1) # [2, 5]
list(dp) # [1, 2, 3, 4, 5, 6]
list(dp.flatmap()) # [1, 2, 3, 4, 5, 6]
Isn't it expected behavior ?
Just double checking if I understand correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw it is analogous behavior to calling flatmap with some fn, for eg
def fn(e):
return [e, e * 10]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong but you might want to do this if all you want is to select an index?
from operator import itemgetter
dp = IterableWrapper([[1, 2, 3], [4, 5, 6]]).map(itemgetter(1))
list(dp) # [2, 5]
The case I worry about is something like this:
dp = IterableWrapper([[1, [2, 2], 3], [4, [5, 5], 6], [7, 8, 9], [10, [11, 11], 12]]).flatmap(input_col=1)
The 3rd list will silently unpack to 8
even though it is different from the other lists.
@ejguan any thoughts on which behavior is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
list(dp) # [[1, [2], 3], [4, [5], 6]]
list(dp.flatmap(input_col=1) # I expect [2, 5] rather than [[2], [5]]
I feel like it's fine to remove iter
from the currently implementation as long as we make it clear in the comment that we would only allow the output being unpackable by yield from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed corrected implementation
b65d5f5
to
0c33295
Compare
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment for one last change and we will merge. Thank you!
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much!
Fixes #738
Changes
in datapipe's common functions:
in FlatMapperIterDataPipe class