Skip to content

adds handling of CRLF to the FTL parser #228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

blushingpenguin
Copy link
Contributor

Currently '\r' characters appear if the FTL file has CRLF line endings -- this makes the parser check for them, and strip them out which is useful for FTL files edited on Windows

@stasm
Copy link
Contributor

stasm commented Jun 22, 2018

@zbraniecki As someone more familiar with the ParserStream code, would you mind taking a look at this? Thanks!

@zbraniecki
Copy link
Collaborator

zbraniecki commented Jun 22, 2018

Hi @stasm - sure I can!

The patch looks good, although it could use a comment above just to call out Microsoft for stubbornness (they're moving to LF, just very slowly - like, with estimated switch by 2020 or something).

The question is for you if you want to take this change. This basically trades a couple lines of code and a bit of performance for supporting Windows editor line endings. Quite frankly, I struggle to justify it. CRLF is a leftover and in most environments I encountered engineers, including ones working on Windows, use LF and every software editor allows for an easy default to LF or even sets it on its own.

In result, I would be totally ok expecting our users to use LF only, but I'd prefer you to make this decision.

For the performance evaluation, all I'm gonna say is that the patch currently will scan every character in the whole FTL source string in all scenarios where the user does not use CRLF (default), on an off chance that somewhere there it'll find CRLF (edge case) and will then run a regexp to switch them all.
That feels like a wrong tradeoff.

We could reverse it and look for a \n not preceded by \r and assume if the first line is not, none will be, but that already makes assumptions, so not sure if you'd prefer that.

I'd be happy to expect a consumer operating in an environment where CRLF is supported to run such conversion before feeding Fluent API with the string, and say that FTL file should be encoded in UTF8 with LF for EOL, no BOM etc. It's 2018 and I believe we can stop paying runtime cost for failed experiments of the past that are being phased out, just slowly. :)

@stasm
Copy link
Contributor

stasm commented Jul 2, 2018

they're moving to LF, just very slowly - like, with estimated switch by 2020 or something

I've looked on the web but couldn't find anything to support this. Do you have a link?

The question is for you if you want to take this change. This basically trades a couple lines of code and a bit of performance for supporting Windows editor line endings. Quite frankly, I struggle to justify it. CRLF is a leftover and in most environments I encountered engineers, including ones working on Windows, use LF and every software editor allows for an easy default to LF or even sets it on its own.

I'm concerned about the first-run experience. Say a localizer wants to edit an FTL file. They download Atom or VS Code and start editing. Both of these editors default to CRLF on Windows. Or maybe they use Notepad, which now supports reading and editing LF, but still saves new files with CRLF.

I'm fine with not supporting CR alone; it's my understanding that it died together with Mac OS 9. However not supporting CRLF has a potential to hurt our most vulnerable users—those who don't know what went wrong nor how to fix it. For them, Fluent will be broken.

@stasm
Copy link
Contributor

stasm commented Jul 2, 2018

Let's move the discussion about whether we should support CRLF to #131 and keep this thread for comments about the implementation.

@stasm stasm mentioned this pull request Jul 2, 2018
@stasm
Copy link
Contributor

stasm commented Jul 2, 2018

Is there any other way than to call .replace() on the entire source? I'm hoping that it's possible to bake CRLF deep into ParserStream. @zbraniecki?

@blushingpenguin
Copy link
Contributor Author

Yes, it's possible to bake support into the parser. I've actually just finished porting the parser to c# (https://github.com/blushingpenguin/Fluent.Net) and I did that at the same time. If I bake the support in directly rather than the kludgy regexp is this something that you'd accept as a PR?

@stasm
Copy link
Contributor

stasm commented Jul 26, 2018

Yes, absolutely! Let's first decide which line ending we want to support in projectfluent/fluent#154 and we can then move on to the implementation. Thanks, @blushingpenguin!

@stasm
Copy link
Contributor

stasm commented Dec 14, 2018

I ended up fixing this inside of the ParserStream class which is responsible for iterating over the source string. See 61187c0#diff-9b489d1af94323e593499f85bbc6355f and e5cf0c0#diff-9b489d1af94323e593499f85bbc6355f. The fix shipped in fluent-syntax 0.9.0.

Thank you for the PR, @blushingpenguin, and please keep them coming!

@stasm stasm closed this Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants