Added default no-op function for flatmap #749

mobikoz · 2022-08-22T09:31:15Z

Fixes #738

Changes

in datapipe's common functions:

added _no_op_fn definition

in FlatMapperIterDataPipe class

added _no_op_fn method use for default callable use

facebook-github-bot · 2022-08-22T09:31:19Z

Hi @mobikoz!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

NivekT

Thanks for contributing this PR. Just some nit comments.

torchdata/datapipes/iter/transform/callable.py

mobikoz · 2022-08-23T13:31:08Z

Waiting for company CLA signature, should be shortly

NivekT

One last thing - can you add a unit test around here?

https://github.com/pytorch/data/blob/main/test/test_iterdatapipe.py#L692

Thanks!

facebook-github-bot · 2022-08-24T15:04:46Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

mobikoz · 2022-08-24T15:08:07Z

Unit test added, also I fixed _no_op_fn to always return iterables.

NivekT

See comment about _no_op_fn, let me know what you think.

NivekT · 2022-08-24T21:08:13Z

torchdata/datapipes/utils/common.py

+    if len(args) == 1 and isinstance(args[0], (list, tuple)):
+        return args[0]
+    else:
+        return args


Suggested change

if len(args) == 1 and isinstance(args[0], (list, tuple)):

return args[0]

else:

return args

if len(args) == 1:

try:

return iter(args[0])

except TypeError:

pass

return args

@mobikoz What do you think of this instead? This will allow objects that are not list or tuple to be unpacked (e.g. numpy.array).

cc: @ejguan

Can I know why do we want to differentiate the cases between size 1 and other sizes?

list(dp) # [[1], [2], [3], [4],] list(dp.flatmap()) # I would expect the result as [1, 2, 3, 4]

BTW, if this function is not commonly used by other DataPipes, we can put it in the callable.py file.

Can I know why do we want to differentiate the cases between size 1 and other sizes?

As far as I observe, because the input to the function is *args, there are two main cases where size == 1:

input_col=None

input_col only specifies one column

Moving it from common.py to callable.py makes sense to me.

@ejguan This differentiation is the consequence of using variable length *args for the sake of input_col parameter use.
If not handled, in case of given example:
list(dp) # [[1], [2], [3], [4],]
list(dp.flatmap()) # result will be [[1], [2], [3], [4],]

This is because args would be always of type tuple, in this case for eg. ([1],) for the first element.

I see, makes sense. Thanks for the explanation.

Currently it will work like that:
list(dp) # [[1, 2, 3], [4, 5, 6]]
list(dp.flatmap(input_col=1) # [2, 5]

list(dp) # [1, 2, 3, 4, 5, 6]
list(dp.flatmap()) # [1, 2, 3, 4, 5, 6]

Isn't it expected behavior ?
Just double checking if I understand correctly.

btw it is analogous behavior to calling flatmap with some fn, for eg

def fn(e): return [e, e * 10]

I might be wrong but you might want to do this if all you want is to select an index?

from operator import itemgetter dp = IterableWrapper([[1, 2, 3], [4, 5, 6]]).map(itemgetter(1)) list(dp) # [2, 5]

The case I worry about is something like this:

dp = IterableWrapper([[1, [2, 2], 3], [4, [5, 5], 6], [7, 8, 9], [10, [11, 11], 12]]).flatmap(input_col=1)

The 3rd list will silently unpack to 8 even though it is different from the other lists.

@ejguan any thoughts on which behavior is better?

How about this?

list(dp) # [[1, [2], 3], [4, [5], 6]] list(dp.flatmap(input_col=1) # I expect [2, 5] rather than [[2], [5]]

I feel like it's fine to remove iter from the currently implementation as long as we make it clear in the comment that we would only allow the output being unpackable by yield from

Pushed corrected implementation

facebook-github-bot · 2022-08-25T13:29:31Z

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

NivekT

See comment for one last change and we will merge. Thank you!

facebook-github-bot · 2022-08-29T14:00:21Z

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

NivekT

Thank you so much!

added default no-op parameter for flatmap

3a8360d

NivekT reviewed Aug 22, 2022

View reviewed changes

torchdata/datapipes/iter/transform/callable.py Outdated Show resolved Hide resolved

torchdata/datapipes/iter/transform/callable.py Outdated Show resolved Hide resolved

NivekT linked an issue Aug 22, 2022 that may be closed by this pull request

Chainer/Concater from single datapipe? #648

Closed

review changes

fdb62ff

NivekT reviewed Aug 23, 2022

View reviewed changes

Unit tests update for flatmap, _no_op function fix

60cbe68

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 24, 2022

mobikoz marked this pull request as ready for review August 24, 2022 15:08

NivekT reviewed Aug 24, 2022

View reviewed changes

Review changes no. 2

0c33295

mobikoz force-pushed the flatmap_default branch from b65d5f5 to 0c33295 Compare August 25, 2022 09:24

NivekT approved these changes Aug 25, 2022

View reviewed changes

review changes no.3

27cd363

NivekT approved these changes Aug 29, 2022

View reviewed changes

facebook-github-bot closed this in 1947599 Aug 29, 2022

Added default no-op function for flatmap #749

Added default no-op function for flatmap #749

Uh oh!

Conversation

mobikoz commented Aug 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

facebook-github-bot commented Aug 22, 2022

Action Required

Process

Uh oh!

NivekT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mobikoz commented Aug 23, 2022

Uh oh!

NivekT left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 24, 2022

Uh oh!

mobikoz commented Aug 24, 2022

Uh oh!

NivekT left a comment

Choose a reason for hiding this comment

Uh oh!

NivekT Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

ejguan Aug 24, 2022 • edited by NivekT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NivekT Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mobikoz Aug 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ejguan Aug 25, 2022

Choose a reason for hiding this comment

Uh oh!

mobikoz Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mobikoz Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NivekT Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

ejguan Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

mobikoz Aug 29, 2022

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 25, 2022

Uh oh!

NivekT left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 29, 2022

Uh oh!

NivekT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mobikoz commented Aug 22, 2022 •

edited

Loading

ejguan Aug 24, 2022 •

edited by NivekT

Loading

NivekT Aug 24, 2022 •

edited

Loading

mobikoz Aug 25, 2022 •

edited

Loading

mobikoz Aug 26, 2022 •

edited

Loading

mobikoz Aug 26, 2022 •

edited

Loading