adds handling of CRLF to the FTL parser #228

blushingpenguin · 2018-06-21T15:58:20Z

Currently '\r' characters appear if the FTL file has CRLF line endings -- this makes the parser check for them, and strip them out which is useful for FTL files edited on Windows

stasm · 2018-06-22T09:06:52Z

@zbraniecki As someone more familiar with the ParserStream code, would you mind taking a look at this? Thanks!

zbraniecki · 2018-06-22T17:36:24Z

Hi @stasm - sure I can!

The patch looks good, although it could use a comment above just to call out Microsoft for stubbornness (they're moving to LF, just very slowly - like, with estimated switch by 2020 or something).

The question is for you if you want to take this change. This basically trades a couple lines of code and a bit of performance for supporting Windows editor line endings. Quite frankly, I struggle to justify it. CRLF is a leftover and in most environments I encountered engineers, including ones working on Windows, use LF and every software editor allows for an easy default to LF or even sets it on its own.

In result, I would be totally ok expecting our users to use LF only, but I'd prefer you to make this decision.

For the performance evaluation, all I'm gonna say is that the patch currently will scan every character in the whole FTL source string in all scenarios where the user does not use CRLF (default), on an off chance that somewhere there it'll find CRLF (edge case) and will then run a regexp to switch them all.
That feels like a wrong tradeoff.

We could reverse it and look for a \n not preceded by \r and assume if the first line is not, none will be, but that already makes assumptions, so not sure if you'd prefer that.

I'd be happy to expect a consumer operating in an environment where CRLF is supported to run such conversion before feeding Fluent API with the string, and say that FTL file should be encoded in UTF8 with LF for EOL, no BOM etc. It's 2018 and I believe we can stop paying runtime cost for failed experiments of the past that are being phased out, just slowly. :)

stasm · 2018-07-02T15:23:36Z

they're moving to LF, just very slowly - like, with estimated switch by 2020 or something

I've looked on the web but couldn't find anything to support this. Do you have a link?

The question is for you if you want to take this change. This basically trades a couple lines of code and a bit of performance for supporting Windows editor line endings. Quite frankly, I struggle to justify it. CRLF is a leftover and in most environments I encountered engineers, including ones working on Windows, use LF and every software editor allows for an easy default to LF or even sets it on its own.

I'm concerned about the first-run experience. Say a localizer wants to edit an FTL file. They download Atom or VS Code and start editing. Both of these editors default to CRLF on Windows. Or maybe they use Notepad, which now supports reading and editing LF, but still saves new files with CRLF.

I'm fine with not supporting CR alone; it's my understanding that it died together with Mac OS 9. However not supporting CRLF has a potential to hurt our most vulnerable users—those who don't know what went wrong nor how to fix it. For them, Fluent will be broken.

stasm · 2018-07-02T15:25:08Z

Let's move the discussion about whether we should support CRLF to #131 and keep this thread for comments about the implementation.

stasm · 2018-07-02T15:27:20Z

Is there any other way than to call .replace() on the entire source? I'm hoping that it's possible to bake CRLF deep into ParserStream. @zbraniecki?

blushingpenguin · 2018-07-26T06:15:59Z

Yes, it's possible to bake support into the parser. I've actually just finished porting the parser to c# (https://github.com/blushingpenguin/Fluent.Net) and I did that at the same time. If I bake the support in directly rather than the kludgy regexp is this something that you'd accept as a PR?

stasm · 2018-07-26T07:03:39Z

Yes, absolutely! Let's first decide which line ending we want to support in projectfluent/fluent#154 and we can then move on to the implementation. Thanks, @blushingpenguin!

stasm · 2018-12-14T11:54:05Z

I ended up fixing this inside of the ParserStream class which is responsible for iterating over the source string. See 61187c0#diff-9b489d1af94323e593499f85bbc6355f and e5cf0c0#diff-9b489d1af94323e593499f85bbc6355f. The fix shipped in fluent-syntax 0.9.0.

Thank you for the PR, @blushingpenguin, and please keep them coming!

handle crlf

d5af7a2

stasm mentioned this pull request Jul 2, 2018

Parse Comments as standalone if they're attached to Junk #235

Merged

stasm mentioned this pull request Jul 2, 2018

Support CRLF as EOL #131

Closed

stasm closed this Dec 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adds handling of CRLF to the FTL parser #228

adds handling of CRLF to the FTL parser #228

Uh oh!

blushingpenguin commented Jun 21, 2018

Uh oh!

stasm commented Jun 22, 2018

Uh oh!

zbraniecki commented Jun 22, 2018 •

edited

Loading

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

blushingpenguin commented Jul 26, 2018

Uh oh!

stasm commented Jul 26, 2018

Uh oh!

stasm commented Dec 14, 2018

Uh oh!

Uh oh!

adds handling of CRLF to the FTL parser #228

adds handling of CRLF to the FTL parser #228

Uh oh!

Conversation

blushingpenguin commented Jun 21, 2018

Uh oh!

stasm commented Jun 22, 2018

Uh oh!

zbraniecki commented Jun 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

blushingpenguin commented Jul 26, 2018

Uh oh!

stasm commented Jul 26, 2018

Uh oh!

stasm commented Dec 14, 2018

Uh oh!

Uh oh!

zbraniecki commented Jun 22, 2018 •

edited

Loading