Skip to content

Support CRLF as EOL #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stasm opened this issue Jan 23, 2018 · 6 comments
Closed

Support CRLF as EOL #131

stasm opened this issue Jan 23, 2018 · 6 comments

Comments

@stasm
Copy link
Contributor

stasm commented Jan 23, 2018

Currently the parser only supports LF as the EOL character. Any files created on Windows are broken unless special care is taken to change their EOLs.

@stasm
Copy link
Contributor Author

stasm commented Jul 2, 2018

In #228 (comment), @zbraniecki wrote:

The patch looks good, although it could use a comment above just to call out Microsoft for stubbornness (they're moving to LF, just very slowly - like, with estimated switch by 2020 or something).

The question is for you if you want to take this change. This basically trades a couple lines of code and a bit of performance for supporting Windows editor line endings. Quite frankly, I struggle to justify it. CRLF is a leftover and in most environments I encountered engineers, including ones working on Windows, use LF and every software editor allows for an easy default to LF or even sets it on its own.

In result, I would be totally ok expecting our users to use LF only, but I'd prefer you to make this decision.

For the performance evaluation, all I'm gonna say is that the patch currently will scan every character in the whole FTL source string in all scenarios where the user does not use CRLF (default), on an off chance that somewhere there it'll find CRLF (edge case) and will then run a regexp to switch them all.
That feels like a wrong tradeoff.

We could reverse it and look for a \n not preceded by \r and assume if the first line is not, none will be, but that already makes assumptions, so not sure if you'd prefer that.

I'd be happy to expect a consumer operating in an environment where CRLF is supported to run such conversion before feeding Fluent API with the string, and say that FTL file should be encoded in UTF8 with LF for EOL, no BOM etc. It's 2018 and I believe we can stop paying runtime cost for failed experiments of the past that are being phased out, just slowly. :)

@stasm
Copy link
Contributor Author

stasm commented Jul 2, 2018

they're moving to LF, just very slowly - like, with estimated switch by 2020 or something

I've looked on the web but couldn't find anything to support this. Do you have a link?

The question is for you if you want to take this change. This basically trades a couple lines of code and a bit of performance for supporting Windows editor line endings. Quite frankly, I struggle to justify it. CRLF is a leftover and in most environments I encountered engineers, including ones working on Windows, use LF and every software editor allows for an easy default to LF or even sets it on its own.

I'm concerned about the first-run experience. Say a localizer wants to edit an FTL file. They download Atom or VS Code and start editing. Both of these editors default to CRLF on Windows. Or maybe they use Notepad, which now supports reading and editing LF, but still saves new files with CRLF.

I'm fine with not supporting CR alone; it's my understanding that it died together with Mac OS 9. However not supporting CRLF has a potential to hurt our most vulnerable users—those who don't know what went wrong nor how to fix it. For them, Fluent will be broken.

@blushingpenguin
Copy link
Contributor

If CR/CRLF are not supported as line endings, then the grammar should be updated as according to
the syntax in the specification:

line_end ::= "\u000D\u000A" | "\u000A" | "\u000D" | EOF

@stasm
Copy link
Contributor Author

stasm commented Jul 26, 2018

There's an open issue in the grammar spec repo about making this call: projectfluent/fluent#154. Let's more the decision making over there.

@stasm
Copy link
Contributor Author

stasm commented Aug 9, 2018

projectfluent/fluent#154 has been fixed. CR will no longer be a valid EOL when the Fluent Syntax Spec 0.7 is released in a few weeks. I think this means we can go back to discussing the implementation in fluent.js.

As per the Syntax spec, the supported line endings should be LF and CRLF. There's also an open issue about normalizing CRLF to LF in the AST (projectfluent/fluent#163); it shoudn't affect the parsing behavior however.

@stasm
Copy link
Contributor Author

stasm commented Oct 23, 2018

Support for the CRLF line endings was added to the runtime and the tooling parsers in #287.

@stasm stasm closed this as completed Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants