-
Notifications
You must be signed in to change notification settings - Fork 79
adds handling of CRLF to the FTL parser #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@zbraniecki As someone more familiar with the |
Hi @stasm - sure I can! The patch looks good, although it could use a comment above just to call out Microsoft for stubbornness (they're moving to LF, just very slowly - like, with estimated switch by 2020 or something). The question is for you if you want to take this change. This basically trades a couple lines of code and a bit of performance for supporting Windows editor line endings. Quite frankly, I struggle to justify it. CRLF is a leftover and in most environments I encountered engineers, including ones working on Windows, use LF and every software editor allows for an easy default to LF or even sets it on its own. In result, I would be totally ok expecting our users to use LF only, but I'd prefer you to make this decision. For the performance evaluation, all I'm gonna say is that the patch currently will scan every character in the whole FTL source string in all scenarios where the user does not use CRLF (default), on an off chance that somewhere there it'll find CRLF (edge case) and will then run a regexp to switch them all. We could reverse it and look for a I'd be happy to expect a consumer operating in an environment where CRLF is supported to run such conversion before feeding Fluent API with the string, and say that FTL file should be encoded in UTF8 with LF for EOL, no BOM etc. It's 2018 and I believe we can stop paying runtime cost for failed experiments of the past that are being phased out, just slowly. :) |
I've looked on the web but couldn't find anything to support this. Do you have a link?
I'm concerned about the first-run experience. Say a localizer wants to edit an FTL file. They download Atom or VS Code and start editing. Both of these editors default to CRLF on Windows. Or maybe they use Notepad, which now supports reading and editing LF, but still saves new files with CRLF. I'm fine with not supporting CR alone; it's my understanding that it died together with Mac OS 9. However not supporting CRLF has a potential to hurt our most vulnerable users—those who don't know what went wrong nor how to fix it. For them, Fluent will be broken. |
Let's move the discussion about whether we should support CRLF to #131 and keep this thread for comments about the implementation. |
Is there any other way than to call |
Yes, it's possible to bake support into the parser. I've actually just finished porting the parser to c# (https://github.com/blushingpenguin/Fluent.Net) and I did that at the same time. If I bake the support in directly rather than the kludgy regexp is this something that you'd accept as a PR? |
Yes, absolutely! Let's first decide which line ending we want to support in projectfluent/fluent#154 and we can then move on to the implementation. Thanks, @blushingpenguin! |
I ended up fixing this inside of the Thank you for the PR, @blushingpenguin, and please keep them coming! |
Currently '\r' characters appear if the FTL file has CRLF line endings -- this makes the parser check for them, and strip them out which is useful for FTL files edited on Windows