Skip to content
This repository was archived by the owner on Feb 10, 2025. It is now read-only.

Don't allow CR to act as newline when it is followed by LF #81

Merged
merged 5 commits into from
Oct 9, 2024

Conversation

DanTup
Copy link
Contributor

@DanTup DanTup commented Oct 3, 2024

This fixes an error I encountered (dart-lang/tools#1797) when scanning a zero-length match when between a CR and LF. It removes some special handling of \r's that seemed to complicate things. Now, \r is just ignored and only \n is treated as the end of a line.

Note: This partially reverts some changes made in 27d20a3 that seemed to be the cause of the error - however the changes added in that change still pass with my changes, and I added some additional tests.

Fixes dart-lang/tools#1797

This fixes an error I encountered (https://github.com/dart-lang/string_scanner/issues/80) when scanning a zero-length match when between a CR and LF. It removes some special handling of \r's that seemed to complicate things. Now, \r is just ignored and only \n is treated as the end of a line.

Note: This partially reverts some changes made in 27d20a3 that seemed to be the cause of the error - however the changes added in that change still pass with my changes, and I added some additional tests.

Fixes https://github.com/dart-lang/string_scanner/issues/80
@DanTup
Copy link
Contributor Author

DanTup commented Oct 3, 2024

@nex3 interested in your thoughts on this since you seem to have committed here recently and made the commit noted above that this changes.

@DanTup
Copy link
Contributor Author

DanTup commented Oct 3, 2024

Looking at 27d20a3 again, I think the intention may have been to also handle a \r that is not followed by \n as a new line:

https://github.com/dart-lang/string_scanner/blob/2139417ffcd0392bde3ba9bc83ee13eaa5fbed01/lib/src/line_scanner.dart#L94

So it's possible this change is not completely consistent (and more tests are needed). However, I'm unsure about this behaviour - should a bare \r really start a new line? I feel like it should not, and some testing from a terminal on Windows does not treat it like a new line:

(note: PowerShell uses backticks for escape characters)

PS D:\> echo "abc`ndef"
abc
def
PS D:\> echo "abc`r`ndef"
abc
def
PS D:\> echo "abc`rdef"
def

We should agree what the behaviour should be here and add additional tests.

@devoncarew
Copy link
Contributor

should a bare \r really start a new line

I think practically only \r\n and \n should be treated as line terminators (historically - and not something to take into account here - I remember macos before OS X used \r as a terminator).

I'm less familiar w/ this package specifically but am comforted by existing + new tests passing. If there are more tests you feel we should have, please add them :)

@@ -61,14 +58,14 @@ class LineScanner extends StringScanner {
}
} else {
final newlines = _newlinesIn(string.substring(newPosition, oldPosition));
if (_betweenCRLF) newlines.removeLast();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line has likely been a bug since this landed. It was introduced during code review.

@natebosch
Copy link
Contributor

natebosch commented Oct 4, 2024

comforted by existing + new tests passing

I'm less comforted to find that we might be missing tests covering that behavior, since the behavior changed during code review and no one noticed.
Of course it's also a good sign that it's been behaving differently than was originally intended for almost a decade and no one noticed, so it's probably not an edge of the behavior we're really relying on.

I will run this against a larger set of tests and see if anything crops up.

Edit: Although I don't know if it adds much confidence, since I won't be running any windows tests.

@nex3
Copy link
Contributor

nex3 commented Oct 4, 2024

The intention was that \r\n would be considered a newline, and that \r followed by anything other than \n would be considered a newline. While modern OSes don't use \r as a stand-alone newline, it is supported by modern software, including for example the CSS specification. I think any changes here should preserve that behavior.

@DanTup
Copy link
Contributor Author

DanTup commented Oct 7, 2024

I have pushed changes to handle \r that is not followed by \n as a newline. While I'm not sure if all users would expect this behaviour, it appears to be the original intention (and mostly supported).

The changes were a little complex when moving the position backwards, because the original code searched for the last newline when computing the new column, but if that newline was two characters, the column would be out by one (because it used the offset of the \r and therefore counted the \n as the first character of the line).

I've added many more tests and everything is passing, but please review carefully and let me know if you think there are additional cases we need tests for.

Copy link
Contributor

@natebosch natebosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for following through with the solution that handles solo /r

@natebosch natebosch changed the title Remove special handling of CR LF and just only treat LF as newline Don't allow CR to act as newline when it is followed by LF Oct 8, 2024
@DanTup
Copy link
Contributor Author

DanTup commented Oct 9, 2024

I think I found a bug here, going to add some new tests and fix if necessary.

@DanTup
Copy link
Contributor Author

DanTup commented Oct 9, 2024

Pushed some additional tests + fixes for:

  • moving backwards with position from between \r\n would incorrectly handle the \r\n because _newlinesIn always assumed the string it was checking ended at position (but it doesn't when moving backwards), so now the end position is passed in.
  • moving forwards from a non-zero character would compute the wrong character because it used the offset of the last newline (match.end) without accounting for the search string only being a subset of the full string (this was an existing bug so I also added to the changelog)

@natebosch natebosch merged commit 084b201 into dart-archive:master Oct 9, 2024
6 checks passed
@DanTup DanTup deleted the fix-newlines branch October 10, 2024 09:48
mosuem pushed a commit to dart-lang/tools that referenced this pull request Dec 11, 2024
Fixes https://github.com/dart-lang/string_scanner/issues/80

This fixes an error when scanning a zero-length match when between a CR and LF. 


* Fix typo

* comment nits

* Fix some bugs when setting position if the current position is between \r\n

---------

Co-authored-by: Nate Bosch <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

RangeError when input string contains \r\n and a pattern matches an empty string
4 participants