Skip to content

StringLike#linesIterator not properly splitting on \r #11241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lihaoyi opened this issue Nov 4, 2018 · 9 comments
Closed

StringLike#linesIterator not properly splitting on \r #11241

lihaoyi opened this issue Nov 4, 2018 · 9 comments

Comments

@lihaoyi
Copy link

lihaoyi commented Nov 4, 2018

This fails to split and returns a single string:

@ "hello\rworld".linesIterator.toArray.length
res5: Int = 1

This is inconsistent with the JVM String#lines: java.util.stream.Stream[String] method:

@ "hello\rworld".lines.toArray.length
res4: Int = 2

The standard java.io.BufferedReader#readLines method:

@
val x = new BufferedReader(new InputStreamReader(new ByteArrayInputStream("hello\rworld".getBytes)))
x: BufferedReader = java.io.BufferedReader@5769e7ae

@ x.readLine()
res2: String = "hello"

@ x.readLine()
res3: String = "world"

@ x.readLine()
res4: String = null

Or other languages like Python

>>> "hello\rworld".splitlines()
['hello', 'world']

Shouldn't it do the same thing?

@som-snytt
Copy link

scala 2.13.0-M5> "hello\n".stripLineEnd
res0: String = hello

scala 2.13.0-M5> "hello\r".stripLineEnd
"es1: String = "hello

scala 2.13.0-M5> "hello\r".stripLineEnd.length
res2: Int = 6

@SethTisue SethTisue added this to the 2.13.0-RC1 milestone Nov 4, 2018
@szeiger
Copy link

szeiger commented Nov 13, 2018

Should \r still be recognized as a line separator today? AFAIK it was only used by the old MacOS which got replaced almost 2 decades ago. Not recognizing \r makes the implementation simpler and probably faster, too.

@som-snytt
Copy link

Compatibility with String.lines is pretty compelling.

OTOH, a\rb could be understood as an overstrike, and therefore one line.

Padding of \r\n suggests it shouldn't be a heuristic, but a mode, so that in CRNL mode, \r\r\r\n and \r\u0000\n are a single line separator.

When I scalac file.scala > scala.out ; vim scala.out, vim picks a mode and shows ^M where scalac emits mixed output because it doesn't respect the line separator.

Since the future of scala is opinionated, it has the option of ignoring line separator and String.lines and solo CR.

@lihaoyi
Copy link
Author

lihaoyi commented Nov 14, 2018

Not just String.lines, but BufferedReader and all that. Surely wrapping a string in a new BufferedReader(new StringReader(s)) is simple enough in terms of implementation?

@som-snytt
Copy link

As another data point, CR is a line break for regexes, also matched by \R.

@adriaanm
Copy link
Contributor

Could someone take this one and implement compat with String.lines? WWJD FTW

@som-snytt
Copy link

Because Scala recognizes FF, I just juiced the current code.

I would propose that FF be taken only if not preceded by a line terminator, since normally, FF would not introduce a line:

"abc\n\f"   // one line? status quo is two lines
"abc\f\n"   // two lines

but for now I left it as it.

FF affects line numbering in the compiler, IIRC.

@SethTisue
Copy link
Member

SethTisue commented Jan 31, 2019

scala/scala#7699 doesn't recognize FF, since Java's String.lines doesn't either

@som-snytt
Copy link

Rust issue 55743 covers the same territory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants