Skip to content

Added default no-op function for flatmap #749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

mobikoz
Copy link

@mobikoz mobikoz commented Aug 22, 2022

Fixes #738

Changes

in datapipe's common functions:

  • added _no_op_fn definition

in FlatMapperIterDataPipe class

  • added _no_op_fn method use for default callable use

@facebook-github-bot
Copy link
Contributor

Hi @mobikoz!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this PR. Just some nit comments.

@NivekT NivekT linked an issue Aug 22, 2022 that may be closed by this pull request
@mobikoz
Copy link
Author

mobikoz commented Aug 23, 2022

Waiting for company CLA signature, should be shortly

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing - can you add a unit test around here?

https://github.com/pytorch/data/blob/main/test/test_iterdatapipe.py#L692

Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 24, 2022
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@mobikoz
Copy link
Author

mobikoz commented Aug 24, 2022

Unit test added, also I fixed _no_op_fn to always return iterables.

@mobikoz mobikoz marked this pull request as ready for review August 24, 2022 15:08
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about _no_op_fn, let me know what you think.

Comment on lines 31 to 34
if len(args) == 1 and isinstance(args[0], (list, tuple)):
return args[0]
else:
return args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(args) == 1 and isinstance(args[0], (list, tuple)):
return args[0]
else:
return args
if len(args) == 1:
try:
return iter(args[0])
except TypeError:
pass
return args

@mobikoz What do you think of this instead? This will allow objects that are not list or tuple to be unpacked (e.g. numpy.array).

cc: @ejguan

Copy link
Contributor

@ejguan ejguan Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I know why do we want to differentiate the cases between size 1 and other sizes?

list(dp)  # [[1], [2], [3], [4],]
list(dp.flatmap())  # I would expect the result as [1, 2, 3, 4]

BTW, if this function is not commonly used by other DataPipes, we can put it in the callable.py file.

Copy link
Contributor

@NivekT NivekT Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I know why do we want to differentiate the cases between size 1 and other sizes?

As far as I observe, because the input to the function is *args, there are two main cases where size == 1:

  1. input_col=None
  2. input_col only specifies one column

Moving it from common.py to callable.py makes sense to me.

Copy link
Author

@mobikoz mobikoz Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejguan This differentiation is the consequence of using variable length *args for the sake of input_col parameter use.
If not handled, in case of given example:
list(dp) # [[1], [2], [3], [4],]
list(dp.flatmap()) # result will be [[1], [2], [3], [4],]

This is because args would be always of type tuple, in this case for eg. ([1],) for the first element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. Thanks for the explanation.

Copy link
Author

@mobikoz mobikoz Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it will work like that:
list(dp) # [[1, 2, 3], [4, 5, 6]]
list(dp.flatmap(input_col=1) # [2, 5]

list(dp) # [1, 2, 3, 4, 5, 6]
list(dp.flatmap()) # [1, 2, 3, 4, 5, 6]

Isn't it expected behavior ?
Just double checking if I understand correctly.

Copy link
Author

@mobikoz mobikoz Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw it is analogous behavior to calling flatmap with some fn, for eg

 def fn(e):
    return [e, e * 10]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but you might want to do this if all you want is to select an index?

from operator import itemgetter
dp = IterableWrapper([[1, 2, 3], [4, 5, 6]]).map(itemgetter(1))
list(dp)  # [2, 5]

The case I worry about is something like this:

dp = IterableWrapper([[1, [2, 2], 3], [4, [5, 5], 6], [7, 8, 9], [10, [11, 11], 12]]).flatmap(input_col=1)

The 3rd list will silently unpack to 8 even though it is different from the other lists.

@ejguan any thoughts on which behavior is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

list(dp) # [[1, [2], 3], [4, [5], 6]]
list(dp.flatmap(input_col=1) # I expect [2, 5] rather than [[2], [5]]

I feel like it's fine to remove iter from the currently implementation as long as we make it clear in the comment that we would only allow the output being unpackable by yield from

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed corrected implementation

@facebook-github-bot
Copy link
Contributor

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment for one last change and we will merge. Thank you!

@facebook-github-bot
Copy link
Contributor

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow flatmap to have no-op option which will vertically flatten datapipe Chainer/Concater from single datapipe?
4 participants