Skip to content

Remove duplicated DataPipe reference from bucketbatcher #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Jan 21, 2022

Stack from ghstack:

Summary:
Fixes #173

Note that the [input to `strip`](https://docs.python.org/3/library/stdtypes.html#str.strip)

> is a string specifying the **set of characters** to be removed. [Emphasis mine]

Thus, stripping works something like

```python
for char in chars:
    string.replace(char, "")
```

rather than

```python
string.replace(chars, "")
```

This means that always stripping `"\r\n"` is harmless even if the line terminator is only `"\n"` or `\"r"`.

Reviewed By: ejguan

Differential Revision: D33684458

Pulled By: NivekT

fbshipit-source-id: 9821b77d60d3afe038ae698965beefe319783aa1

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2022
ejguan added a commit that referenced this pull request Jan 21, 2022
Summary:
Fixes #173

Note that the [input to `strip`](https://docs.python.org/3/library/stdtypes.html#str.strip)

> is a string specifying the **set of characters** to be removed. [Emphasis mine]

Thus, stripping works something like

```python
for char in chars:
    string.replace(char, "")
```

rather than

```python
string.replace(chars, "")
```

This means that always stripping `"\r\n"` is harmless even if the line terminator is only `"\n"` or `\"r"`.

Reviewed By: ejguan

Differential Revision: D33684458

Pulled By: NivekT

fbshipit-source-id: 9821b77d60d3afe038ae698965beefe319783aa1

ghstack-source-id: 37a119b
Pull Request resolved: #176
@ejguan ejguan changed the title fix newline stripping in plain text readers (#174) Remove duplicated DataPipe reference from bucketbatcher Jan 21, 2022
@ejguan ejguan closed this Jan 21, 2022
@facebook-github-bot facebook-github-bot deleted the gh/ejguan/15/head branch February 21, 2022 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants